Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
Tomorrow we’re hosting a town hall for AI builders at OpenAI. We want feedback as we start building a new generation of tools.
This is an experiment and a first pass at a new format — we’ll livestream the discussion on YouTube at 4 pm PT.
Reply here with questions and we’ll answer as many as we can!
Thanks everyone for testing Kimi K2 Thinking and sharing benchmark results!
We've noticed that benchmark outcomes can vary across providers. Some third-party endpoints show substantial accuracy drops (e.g., 20+ pp), which has negatively affected scores on reasoning-heavy tasks like LiveBench.
We're re-running checks and will share more data soon via the Vendor Verifier (https://t.co/Fzo7OfBe8j) to keep results consistent and transparent.
👉 For reliable benchmark testing, we strongly recommend:
- Use our official API endpoint kimi-k2-thinking-turbo
- Enable stream = True
- Set temperature = 1.0
- Suggested max_token: Reasoning 128k | Coding 256k | Other ≥64k
- Add retry logic to your script
👉 For the full benchmark setup guide, check here: https://t.co/fwvad2zJ1s
We'll continue publishing provider verification results on GitHub. If there are metrics or cases you'd like us to include in the next round, please drop your suggestions in the repo.
As always, your input helps shape the next iteration.
@web3pingu https://t.co/Ofz1jFN5xc is dropping this week. Be legendary.
Like us redneck dev wanna bes.
Not vibe coding. Just being dumb and deploying dope shit! :) #OHBOYAPPROVED