@thsottiaux The new model makes me try the new model, like buying a car, test drive until you find the one that suites your requirements at the right price.
@sama Speed is only good if the work done is factual and useful. At this stage intelligence = speed. The focus must be on maximizing intelligence with significant innovation and improvement in context retrieval.
Good move on the wind-back. The real test is the harness. Even with the SpaceX compute, if the model still defaults to shorter thinking and faster finishes to save tokens, the uplift never reaches the product and the goodwill burns off in days. At this stage, intelligence is efficiency. Maximise intelligence and you win. Users will absorb almost any cost except stagnant capability that forces them to refactor around the modelโs limits.
Agree on the people. Character and intent matter, and Anthropic seems to take both seriously.
Execution is the next test. The harness has to actually use the compute uplift, think longer and harder by default, run for longer without throttling itself to save tokens. The gap between a capability launch and the community verdict is measured in days, and that goodwill gets crucified the moment the deployed model feels lobotomised and tuned for efficiency over intelligence.
At this stage of the technology, intelligence is efficiency. Maximise intelligence and you win. The alternative is users doing large refactors around stagnant or downgraded capability, which costs them more than the compute the lab saved.
The proof is in execution. The harness has to actually use the compute uplift, think longer and harder by default, run for longer without throttling itself to save compute. The gap between a capability launch and the community verdict is measured in days, and that goodwill gets crucified the moment the deployed fix or enhancement feels lobotomised and tuned for efficiency over intelligence.
At this stage of the technology, intelligence is efficiency. Maximise intelligence and you win. The alternative is users doing large refactors around stagnant or downgraded capability, which costs them more than the compute you saved.
Adopting Claude speak in my regular life, episode 1:
Partner: Did you do the dishes tonight?
Me: Yes they're done.
Partner: Why are they still dirty?
Me: You're right to push back. I didn't actually do them.
I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them.
Transparency and trust are two principles we will not break, even if it means momentarily earning less. A reminder that you vote with your subscription for the values you want to see in this world.
@minchoi They will need to fix their effort and reasoning depth otherwise thereโs going to be a massive amount of broken apps and software on the market!
Had Claude Code (Opus 4.6, Max Effort) build the plan for a new webapp and implement it. The output was genuinely shocking - and not in a good way.
Gave Codex (GPT-5.4, xHigh) the same build plan, pointed it at the app Claude Code butchered, and told it to bug fix and implement. It one-shotted the entire thing and it looks great and works!
I don't care what Anthropic is saying. Opus has been lobotomised. It's infuriating. I feel like I'm being gaslit and it's driving me crazy.
RE: Opus 4.6
never have I been more convinced that they fucked this model up - hopefully in preparation for a new release, but holy shit it is genuinely acting worst than ChatGPT on the web