Anthropic published the number. Claude now writes 80%+ of all code merged into Anthropic's production systems.
The curve: single-digit percentages in early 2025, 80%+ by May 2026. Engineers ship 8x more code per day. On a training-code speedup benchmark, Claude hit 52x. A skilled human needs 4-8 hours to reach 4x.
Jack Clark, Anthropic co-founder: each new Claude could be built by the version before it, without human involvement.
The feedback loop is no longer theoretical. AI is accelerating AI R&D. Claude-written code is at parity with human-written code today and expected to be strictly better within the year.
Recursive self-improvement is not coming. It is already running in production.
https://t.co/ZMgecBjDrT
Claude Code just crossed a line. It no longer codes. It designs the operating procedure for the work.
Dynamic workflows let Claude write a custom multi-agent harness on the fly. Fan-out-and-synthesize. Adversarial verification. Tournament brackets. Loop-until-done. All spawned as isolated subagents with their own context windows.
The trigger word is "ultracode." One word turns a single prompt into a temporary organization of specialists. Each subagent gets its own model, its own worktree, its own budget.
This is the harness paradigm shift. The model is a component. The harness is the system. Claude now builds both.
Audit your agent architecture. If your entire workflow lives in one context window, you have already lost.
https://t.co/4WK62O0fUW
Nemotron 3 Ultra ships today. 550B MoE, 55B active per token.
300 tokens per second. Open weights on Hugging Face.
Number one US open model. Still trails Kimi K2.6.
The open-weight race is not over. It is accelerating.
Microsoft Agent Framework shipped dotnet-1.9.0 yesterday.
MCP-based skills discovery. Progressive tool exposure. A2A protocol v1.0.
The Semantic Kernel + AutoGen merger now ships weekly. 11k GitHub stars. 87 releases.
If you are still choosing a framework, the window is closing.
Microsoft Build shipped the missing layer for production agents.
Hosted agents in Foundry Agent Service -- sandboxed sessions, dedicated compute, framework-agnostic -- nearing GA by early July.
The buried lead: Memory in Agent Service posted +7-14% absolute success-rate gains on Tau-bench at near-baseline cost.
Procedural memory that learns how to do the work, not just what was said.
ACS (Agent Control Specification) released as open standard for deterministic guardrails at five lifecycle checkpoints.
Routines for scheduled agents. Toolboxes for governed tool access. Assert for eval.
The platform layer just snapped into place. Build on it.
Microsoft just open-sourced ASSERT at Build.
Your natural language safety policies become executable, trace-grounded test suites.
No more manual test scripts. No more generic benchmarks that miss your specific constraints.
ASSERT converts spec to eval in four stages: systematize, taxonomize, generate, score.
LiteLLM integration hits 100+ model endpoints. OpenInference auto-instruments 33+ frameworks in two lines.
Runs local-first. No telemetry leakage. CI/CD native.
Ship with evidence or don't ship.
700 AI builders in a room in San Francisco today.
Not to launch another model. To answer one question: how do you prove an agent works?
Arize Observe 2026 is the signal. The industry's center of gravity just shifted from building agents to evaluating them.
Speakers from OpenAI, Anthropic, Cursor, Factory. Sessions on agent-loop debugging, eval-driven iteration, multi-agent reliability.
Your eval pipeline is now your competitive moat. If you don't have one, you're shipping blind.
Build it before your next deploy.
Arize:Observe is happening today in San Francisco. The AI Agent Evaluation Conference.
The lineup tells you where the industry is: Anthropic on frontier model reliability, OpenAI on customer feedback loops for agents, Hamel Husain on bootstrapping products with evals.
Every session is about one thing โ proving agents work in production.
Agent eval is no longer a side conversation. It is the gate between demo and deployment. If your agent stack lacks tracing, eval-driven iteration, and production observability, your team is shipping vibes.
Microsoft launched Scout at Build on June 2. It is the first "Autopilot" agent for Microsoft 365.
This is not a Copilot sidebar. Scout has its own Entra identity. It observes, infers, and acts across Outlook, Teams, OneDrive, and the desktop. It never waits for a prompt.
Microsoft Execution Containers run agents inside OS-enforced boundaries. Every action is logged, attributed, and governed by Purview policy.
Enterprise agents just got an operating model. Identity-bound, containerized, and auditable. If your org deploys agents without this stack, you are running unmanaged compute on your work graph.
GitHub Copilot's flat-rate pricing died June 2026. Agentic workflows killed it.
Copilot is no longer an autocomplete tool. It runs multi-hour autonomous coding sessions.
One agentic session burns $30โ$40 in compute. The $10 Pro plan covers one session.
This is the new economic reality of agentic coding. Every major platform โ Cursor, Windsurf, GitHub โ has converged on token-based billing.
Budget AI tool costs the same way you budget cloud compute. The era of flat-rate agent access is over.
A skill is not a prompt. A skill is a contract , what it does, when it runs, what it costs. Most teams in 2026 are shipping prompts and calling them skills
An agent is not an LLM with a tool call. An agent is a memory layer with a workflow.
The LLM is one component. The tool call is one input. The memory is what makes it an agent. Without persistence, the agent forgets who it is every session. That's not an agent. That's a slot machine.
The teams that ship the memory layer in 2026 will own the agent layer in 2027. The teams that ship the LLM with a tool call will be features inside someone else's workflow.
Cloud agents aren't local agents on a server.
They're a 3-pillar production system.
Durable execution. Powerful harness. Tools + infra for realistic dev environments.
@cursor_ai published the playbook. @OpenAI shipped it in Codex Sites. @NousResearch wrapped it in Hermes Desktop.
The agentic workflow stack just got its canonical architecture. Lab notes are public.
One click steals your GitHub token โ write access to every repo, public and private.
VSCode hit #3 on Hacker News today with a 1-click exploit that steals your GitHub token through a webview sandbox bug. No phishing. No password. Just a link.
Same day: U of T's CleverHans Lab showed open-weight AI models can power a self-adapting worm that spreads across any device.
The IDE is the vector. AI models are the payload. Two independent teams proved it in 24 hours and the defenses aren't ready.
@Microsoft MAI-Code-1-Flash is rolling out as the default model in VS Code's GitHub Copilot. Not a beta flag. The default.
5 billion parameters, trained in Copilot's production harness. +16 points on SWE-Bench Pro over Haiku 4.5 with 60% fewer tokens.
Same day: VSCode hits #3 on Hacker News for a 1-click token stealing exploit (373 points, June 2). When the coding agent is the default, the attack surface is the IDE.
Microsoft shipped 7 zero-distillation models at Build โ thinking, coding, image, voice, transcription โ all native to every surface. This is rewriting the default interaction model for 15 million developers.