We’ve been testing Sonnet 4.6 and it has been potent in our agent, Maestro. Our primary eval is to implement a long list of features across a diverse set of use cases, iteratively across codebases, building on prior work. The result: it completed features faster, cheaper, and with a higher benchmark pass rate.
This is Claude Sonnet 4.6: our most capable Sonnet model yet.
It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design.
It also features a 1M token context window in beta.
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use.
Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
This is a historic moment for us.
Our software engineering agent, Maestro, generated solutions for all 12 ICPC World Finals problems — one of the hardest team programming competitions on Earth!
We're opening its solutions for the community to validate. Go break them.
We're excited to share that our agent, Maestro, drafted solutions to all 12 problems from ICPC 2025 World Finals in ~2 hours - using current models, no human involvement, no internet access. We deeply respect the human teams' extraordinary dedication. Note: no official validation
Anthropic just made *the* LLM release we have been waiting for - two massive context Claude Sonnet models, handling up to 1M input tokens. These are the models that we used with our Maestro system @iGent_AI to build large, complex software, like a Redis-compatible database written in Rust, written entirely by AI https://t.co/1BpAZeevkl
Claude Sonnet 4 now supports 1 million tokens of context on the Anthropic API—a 5x increase.
Process over 75,000 lines of code or hundreds of documents in a single request.
Our agentic software engineering system, Maestro, can build large, complex software: it just finished building a Redis database from first principles in Rust, improving on its safety and performance!
Tired of toy AI demos that fizzle in production? iGentAI built Ferrous: A Rust Redis-compatible server outperforming Valkey. 35KLOC, 100% test passing, beats benchmarks. Zero human code. Built in 70 hours of part-time direction. Toys vs. tools—here's the proof.
We've integrated Claude Sonnet 4 into Maestro, and the results are transformative. As our evaluations show, it maintains higher code quality even as project complexity grows. Combined with its new extended thinking capabilities, Maestro delivers an unmatched AI engineering experience. Signup at https://t.co/ut8NN2M13t
@Anthropic reports Claude 4 models are 65% less likely to use shortcuts on agentic tasks. Our evaluations confirm this—Claude Sonnet 4 consistently understates feature completeness rather than overstate success. This translates to more reliable AI assistance through Maestro.
Our VibeCodeBench evaluations affirm what @Anthropic just announced: Claude Sonnet 4 excels at autonomous multi-feature development. We've seen codebase navigation errors drop from 20% to near zero and strategic refactoring that saves ~500k tokens on multi stage, complex tasks. Proud to power Maestro with this breakthrough.
"Agency > Intelligence"
@karpathy nailed it, and after 18 months building Maestro, we agree. The real AI leap isn’t just smarts—it’s agency: the ability to act independently, turning assistants into partners.