@scaling01 Since Fable low beats Opus max in some cases, I wouldn't be surprised if we got nvfp4 Fable lite to cover the range of Opus low down to Haiku
@rhythmrg Guy that runs post training company shills post training models lol.
It's never about local or frontier, general or your own post train. It's about if the model is "good enough".
If local is good, use it.
If a post train is good, use it.
It's never good enough. Not yet.
@thealexbanks "Never attribute to malice..."
It's very likely Ant knew and already tested. The jailbreak "worked" but Fable didn't produce meaningful uplift beyond other models. USG saw it as "your guardrails don't work" and that Ant was dismissive of their concerns.
@abacaj There are two sets of problems for AI
1. Problems with an objectively correct verifiable solution.
2. Open ended problems that require general understanding and "taste".
Harness development and more thinking will solve type 1. Bigger, smarter, better trained models solve type 2
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees.
The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance.
Access to all other Claude models is not affected.
We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible.
Read our full statement: https://t.co/bwn0sximKZ
@scaling01 I love to call out benchmaxxing, but at some point models will benchmax on all/extensive benchmarks.
At fundamental level this is not ASI. Practically speaking it's not important for most people. After that it's "just as good" as long as the benchmarks cover your use cases.
@petergostev It's difficult to balance 1 harness for 4 different model tiers released 8 months apart.
Even the same model in a harness that is updated over months performs worse. Ideally Claude code should be a wrapper that routes to models and task specific harnesses.
@ClementDelangue@JakeKAllDay@ArtificialAnlys It's honest in that what they test is what you get. It's representative of real performance.
You're perfectly allowed to do that, just show cost and inference time when you do. In fact if you could, I'm sure people would favor your model since it's cheaper and faster on average
@dog_foot_ruler_@thoughtfullab That's only going forward. Since release, "frontier llm development" requests would make Fable give a "dumbed down" response. Which apparently is still much better than 4.8 or this wasn't considered frontier/sota
@rohanpaul_ai Not quite accurate. The issue is that they were silently messing with how Fable thinks. You'd still be talking to Fable but it'd give you bad results *without* telling you.