Anthropic just dropped Claude Opus 4.8.
It beats GPT-5.5 and Gemini at coding. But that’s not the big deal.
The big deal: it finally stopped lying. 4x less likely to say “all done” when there are bugs. It actually tells you “not sure about this part, double-check it.”
Numbers?
750,000 lines of code. 11 days. One developer.
Claude Opus 4.8 rewrote the entire Bun runtime from Zig to Rust - hundreds of AI agents running in parallel, 99.8% of tests passing.
Work that used to take a team months.
everyone’s staring at Opus 4.8 benchmarks but the real upgrade is this:
> the model learned to admit when it’s not sure.
> 4x less likely to swear “it works” when it doesn’t.
> an honest model better than a smart model. trust me on this one
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
Anthropic just dropped Claude Opus 4.8.
It beats GPT-5.5 and Gemini at coding. But that’s not the big deal.
The big deal: it finally stopped lying. 4x less likely to say “all done” when there are bugs. It actually tells you “not sure about this part, double-check it.”
Numbers?
750,000 lines of code. 11 days. One developer.
Claude Opus 4.8 rewrote the entire Bun runtime from Zig to Rust - hundreds of AI agents running in parallel, 99.8% of tests passing.
Work that used to take a team months.
everyone’s staring at Opus 4.8 benchmarks but the real upgrade is this:
> the model learned to admit when it’s not sure.
> 4x less likely to swear “it works” when it doesn’t.
> an honest model better than a smart model. trust me on this one