TheTinman @PaxMachinae - Twitter Profile

about 2 hours ago

@scaling01 Since Fable low beats Opus max in some cases, I wouldn't be surprised if we got nvfp4 Fable lite to cover the range of Opus low down to Haiku

0

16

TheTinman @PaxMachinae

1 day ago

@teortaxesTex Surprised how fast they managed to distill Le Chaton Fat

0

5

0

331

TheTinman @PaxMachinae

1 day ago

@filodesotano @spicey_lemonade @scaling01 They're getting there https://t.co/R98dvI2cUE

Z.ai @Zai_org

1 day ago

GLM-5.2 leads GLM-5.1 by a wide margin across various domains, including coding, tool usage, reasoning, and general knowledge.

13

978

61

93

197K

1

0

16

TheTinman @PaxMachinae

1 day ago

@rhythmrg Guy that runs post training company shills post training models lol. It's never about local or frontier, general or your own post train. It's about if the model is "good enough". If local is good, use it. If a post train is good, use it. It's never good enough. Not yet.

0

194

Who to follow

Building @synthient_vc

Ankur Taunk

@a_taunk

Applied AI @Amazon | World is one family

TheTinman @PaxMachinae

2 days ago

@thealexbanks "Never attribute to malice..." It's very likely Ant knew and already tested. The jailbreak "worked" but Fable didn't produce meaningful uplift beyond other models. USG saw it as "your guardrails don't work" and that Ant was dismissive of their concerns.

0

3

0

1K

TheTinman @PaxMachinae

2 days ago

@elder_plinius Critical infrastructure would have hopefully had 1̵8̵0̵ 177 days to have been hardened by Fable/Mythos 5

0

1

0

1K

TheTinman @PaxMachinae

3 days ago

@iamgroguu @scaling01 10T MoE, less active parameters. Don't fully agree but it's the only reason you'd rank 4.5 over Fable size wise.

0

2

0

41

TheTinman @PaxMachinae

3 days ago

@iamgroguu @scaling01 GPT-4.5 was speculated as being the largest dense model. 1-2T dense. $75/$150 per 1m input/output tokens

1

0

60

TheTinman @PaxMachinae

3 days ago

@Adidotdev GPT-5.6 is weak but they have /fast on by default and name it GPT-5.6-mini instead

0

54

TheTinman @PaxMachinae

3 days ago

@raulvk It is very likely that Anthropic was already aware of the jailbreaks. What they did not know is what the USG reaction to finding out would be.

0

715

TheTinman @PaxMachinae

3 days ago

@bayeslord @HarmonyHacker Goes up close to 80% compute dominance today if you consider nvfp4 training. The gap will only widen.

0

19

TheTinman @PaxMachinae

3 days ago

@abacaj There are two sets of problems for AI 1. Problems with an objectively correct verifiable solution. 2. Open ended problems that require general understanding and "taste". Harness development and more thinking will solve type 1. Bigger, smarter, better trained models solve type 2

0

1

0

1

122

TheTinman @PaxMachinae

5 days ago

The genie is not going back into the bottle.

Anthropic

@AnthropicAI

5 days ago

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: https://t.co/bwn0sximKZ

13K

88K

26K

24K

91M

0

19

TheTinman @PaxMachinae

5 days ago

@0xSero It's not, it's well documented that it's 1h for claude code sub. https://t.co/NRJ8mIR9mU.

0

1

0

2

243

TheTinman @PaxMachinae

5 days ago

@daniel_mac8 Fable 5.1 dropping June 23? 👀

0

1

0

43

TheTinman @PaxMachinae

5 days ago

@scaling01 I love to call out benchmaxxing, but at some point models will benchmax on all/extensive benchmarks. At fundamental level this is not ASI. Practically speaking it's not important for most people. After that it's "just as good" as long as the benchmarks cover your use cases.

0

1

0

937

TheTinman @PaxMachinae

5 days ago

@petergostev It's difficult to balance 1 harness for 4 different model tiers released 8 months apart. Even the same model in a harness that is updated over months performs worse. Ideally Claude code should be a wrapper that routes to models and task specific harnesses.

0

364

TheTinman @PaxMachinae

5 days ago

@ClementDelangue @JakeKAllDay @ArtificialAnlys It's honest in that what they test is what you get. It's representative of real performance. You're perfectly allowed to do that, just show cost and inference time when you do. In fact if you could, I'm sure people would favor your model since it's cheaper and faster on average

0

3

0

82

TheTinman @PaxMachinae

6 days ago

@dog_foot_ruler_ @thoughtfullab That's only going forward. Since release, "frontier llm development" requests would make Fable give a "dumbed down" response. Which apparently is still much better than 4.8 or this wasn't considered frontier/sota

1

17

0

1

3K

TheTinman @PaxMachinae

6 days ago

@rohanpaul_ai Not quite accurate. The issue is that they were silently messing with how Fable thinks. You'd still be talking to Fable but it'd give you bad results *without* telling you.

0

39

TheTinman

@PaxMachinae

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users