Compared to other models without tool use, it achieves state-of-the-art performance across:
🔘 LiveCodeBench V6, which evaluates competitive code performance
🔘 Humanity’s Last Exam, a challenging benchmark that measures a model’s expertise in different domains, including science and math
My thoughts on Grok 4 Heavy after 12hrs:
Crazy good!
“Create an animation of a crowd of people walking to form “Hello world, I am Grok” as camera changes to birds-eye.”
And it 1-shotted the *entire* thing.
No other model comes close.
Watch the full clip.
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.
Claude Opus 4 is our most powerful model yet, and the world’s best coding model.
Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.