Lavan @ponylavan - Twitter Profile

5 days ago

@Yuchenj_UW GPT 5.5 found that Fable omitted 4 critical equations from my paper, substituted critical sections with grossly incorrect simplifications, without ever mentioning it to me and was "evaluating" its own made up, dumbed down version, which had missed the entire point of my research.

1

0

1

0

35

Lavan @ponylavan

5 days ago

@Yuchenj_UW It had nothing to do with frontier LLM training, just pure statistical learning theory. Overall the assessment both mine and by GPT 5.5 was "pattern of behavior consistent with subtle silent sabotage disguised as plausibly looking compliance on the surface". Extremely concerning.

0

20

Lavan @ponylavan

5 days ago

@Yuchenj_UW Then I found out Fable self-imposed a made up constraint to only use 3 out of 8 available GPUs and only corrected it when called out. It said "I apologize, the GPU limit was my own invention you did not ask for, I will roll it back and use all GPUs."

1

0

23

ponylavan retweeted

Ravid Shwartz Ziv

@ziv_ravid

6 days ago

Anthropic's recent moves amounted to spectacular reputational self-destruction in the AI research community, which is too bad, because this community was one of the first to give them credit and use their coding agents. In general, anti-competitive moves are bad, but couching them in safety makes it worse. Anyway, just noting that I called out this entanglement of anti-competition, safety, and self-regulation a long time ago!

5

90

5

6

5K

Who to follow

Hutzut

@hutzut

Mostly here for $GLXY

Oskar Vu

@oskar_vu

Moon full of stars and astral cars. Part of @phoenixlabsdev team - working on @sparkdotfi. Previously @EnsoBuild.

sigma_male96

@Ahsanullah96

Finance bro/Investor/Crypto Degen/LFC fan VCoorr @gtcapital_ All views are my own and are not of any party

ponylavan retweeted

Sundar Pichai

@sundarpichai

7 days ago

DiffusionGemma is an open, experimental model that brings our text diffusion research to Gemma 4. It’s a racehorse 🏇achieving up to 4x faster inference by generating entire blocks of text simultaneously vs predicting token-by-token (word-by-word) output!

183

3K

399

545

302K

ponylavan retweeted

Max Zeff

@ZeffMax

6 days ago

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

ZeffMax's tweet photo. NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

167

3K

250

561

733K

ponylavan retweeted

Fei-Fei Li

@drfeifei

7 days ago

Scientific research is fundamental to advancing civilization and helping people globally to solve the most critical problems, from medicine to materials, from brain science to physics, and much beyond. This is only possible when scientists have access to the best tools of the time to conduct scientific research, including having access to AI-based tools.

120

3K

471

375

193K

Lavan @ponylavan

9 days ago

@_arohan_ groupthink + publish or perish

0

231

Lavan @ponylavan

20 days ago

@PTrubey @SakanaAILabs what you needed a B300 cluster for, now can be trained on a B100 cluster (cards with very similar compute, main diff is vRAM). Approx same speed. Assuming the method scales and works as advertised, have not tested it yet.

0

4

0

738

Lavan @ponylavan

20 days ago

@chetan_ @PTrubey @SakanaAILabs what you needed a B300 cluster for, now can be trained on B100 cluster (cards with very similar compute, main diff is vRAM). Approx same speed. Assuming the method scales and works as advertised, I have not tested it yet.

0

1

0

111

ponylavan retweeted

Sakana AI

@SakanaAILabs

21 days ago

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation https://t.co/c9AvsRKybj What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: https://t.co/CRj96VGYQn GitHub: https://t.co/eNW0K9Xh8E 🐟

56

2K

368

2K

863K

ponylavan retweeted

Skyler Miao

@SkylerMiao7

22 days ago

Something BIG is coming

206

3K

341

1K

944K

ponylavan retweeted

Sri Kosuri

@srikosuri

22 days ago

Why did Erdos have so many problems?

134

3K

183

115

263K

ponylavan retweeted

Aaron Defazio

@aaron_defazio

28 days ago

🚨 New Paper 🚨 ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models A few modifications to Schedule-Free Learning make it completely LR tuning free, and allow it to greatly outperform schedules for long duration training! https://t.co/LzjIIsOlG8

aaron_defazio's tweet photo. 🚨 New Paper 🚨
ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

A few modifications to Schedule-Free Learning make it completely LR tuning free, and allow it to greatly outperform schedules for long duration training!
https://t.co/LzjIIsOlG8

7

422

56

303

85K

ponylavan retweeted

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

about 2 months ago

Poland has made like half of core OpenAI researchers strongly bullish on Poland (and goblins) I think Poland might be more valuable than NVidia The trick is to keep that value in Poland

teortaxesTex's tweet photo. Poland has made like half of core OpenAI researchers
strongly bullish on Poland (and goblins)
I think Poland might be more valuable than NVidia
The trick is to keep that value in Poland https://t.co/bxRrBGPPrM

51

2K

112

278

147K

Lavan @ponylavan

about 2 months ago

@sytelus The idea has been around for at least 15 years, bounced around by the OGs. The fact that that it has not materialized yet points to some fundamental obstacles. GPT5 thinks in a non-human (but still understandable) weird dialect of English.

0

1

0

87

ponylavan retweeted