Jan Chorowski

20 days ago

The full Transformer vs Post-Transformer debate is live. 80 minutes. Seven rounds. No slides. Real disagreement. @lukaszkaiser came to defend the Transformer. @adrian_pathway, @YesThisIsLion, and @mlech26l made the case for what comes next. 00:00 Contenders enter the ring 06:30 Lukasz Kaiser defends the Transformer 10:08 Adrian Kosowski on BDH and the PageRank Moment for AI 17:35 Llion Jones: Why Transformers aren't the final architecture 29:50 Mathias Lechner on Liquid AI’s approach, Fast Weights, and Self-Replacing AI 40:28 Reasoning Beyond Language 44:15 Scaling Laws: Transformer vs Post Transformer 50:31 Benchmarks, Coding Models, and Perplexity 1:04:00 Continual Learning and Dynamic Weights This is the ultimate source of truth on the subject.

17

211

22

137

1M

JChorowski retweeted

about 1 month ago

May 5 in SF, @lukaszkaiser is joining us for Transformers vs. Post-Transformers boxing match: The Deciding Round. 🥊 This is heavy weight, folks!

3

12

7

2

792

JChorowski retweeted

LLM Efficiency @NVIDIA - views have always been only my own 🥇🥈 @ Flunkyball Polish Championships

about 2 months ago

Inventors of Transformer and Post-Transformer architectures are stepping into a boxing ring in San Francisco! This is the battle between innovations that shape trillion dollar markets - presented by their very authors (!). May 5. Thread 🥊

5

49

14

16

79K

Who to follow

Piotr Nawrot

@p_nawrot

ELLIS Amsterdam

@Ellis_Amsterdam

Account of the ELLIS Amsterdam Unit. Promoting research excellence and advancing breakthroughs in AI

Aidan O’Gara

@aidanogara_

Aligning the technocapital machine. Doctoral student in AI at Oxford and grantmaker at Longview.

JChorowski retweeted

2 months ago

Today we are sharing a new result from BDH: 97.4% accuracy on Extreme Sudoku puzzles while maintaining language fluency. No chain-of-thought Current LLMs → nearly 0% accuracy. If a model can write beautifully but still cannot reason through a hard constraint space, that is not a side issue. That is the issue.

17

66

19

30

7K

JChorowski retweeted

3 months ago

Cool to see Pathway named No. 10 in the Data Science category on @FastCompany ’s Most Innovative Companies list. We believe the biggest limitation in AI today is memory. Models reset every time. We’re building AI that learns continuously and adapts over time. Welcome to the Post-Transformer world. Excited for what’s ahead.

5

17

6

0

655

3 months ago

Evolving memory is an unlock for long reasoning and agentic applications!

3 months ago

My conversation with @EyeOn_AI is now live. We talked about why the next leap in AI will come from giving models something current systems still largely lack: real, evolving memory.

8

22

6

2

124K

0

1

0

38

JChorowski retweeted

3 months ago

My conversation with @EyeOn_AI is now live. We talked about why the next leap in AI will come from giving models something current systems still largely lack: real, evolving memory.

8

22

6

2

124K

JChorowski retweeted

NVIDIA AI Developer

@NVIDIAAIDev

5 months ago

Most “efficient attention” tricks collapse at high KV compression ratios—DMS shows you can get ~8× KV compression with ~1K training steps and still improve reasoning Pareto frontiers vs dense Qwen-R1 models. The key: a learned, delayed token-eviction policy trained via logit distillation, not ad-hoc attention heuristics, so longer and wider chains are feasible at fixed KV budgets. Download our latest Checkpoint: 🎓 Paper - https://t.co/D94xXtP2p0 💾 Checkpoint - https://t.co/91v3BqtwBW

NVIDIAAIDev's tweet photo. Most “efficient attention” tricks collapse at high KV compression ratios—DMS shows you can get ~8× KV compression with ~1K training steps and still improve reasoning Pareto frontiers vs dense Qwen-R1 models.

The key: a learned, delayed token-eviction policy trained via logit distillation, not ad-hoc attention heuristics, so longer and wider chains are feasible at fixed KV budgets.

Download our latest Checkpoint:
🎓 Paper - https://t.co/D94xXtP2p0
💾 Checkpoint - https://t.co/91v3BqtwBW

4

439

67

194

74K

6 months ago

@zuzanna_pathway @Steve_Rosenbush @WSJ @pathway_com @nvidia @AWS We are presenting it at re:Invent, see you there! Tuesday, Dec 24:00 PM - 4:20 PM PST Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

0

2

0

97

JChorowski retweeted

Pathway (www.pathway.com) @pathway_com

6 months ago

“Memory is key to intelligence and efficient reasoning.” @Steve_Rosenbush at The @WSJ covered how @pathway_com is rethinking AI from the ground up and our newly announced integration with @Nvidia and @AWS - not just scaling models, but evolving intelligence itself. Dragon Hatchling (BDH) represents the beginning of the Post-Transformer Era, read the full piece: https://t.co/K32VkipuGk

1

35

12

7

5K

JChorowski retweeted

8 months ago

As @Forbes's @iamVictorDey writes, Pathway's BDH “may have sparked the beginning of a new era in AI — one where machines don’t just imitate the brain, but begin to think like it.” https://t.co/UmgWRAOL1M

2

26

14

4

3K

8 months ago

@miloszlodowski Z kolejnym breakthrough :)

0

7

Pathway (www.pathway.com) @pathway_com

8 months ago

@Arrogance_0024 @piekno7 @Arrogance_0024 próbowałeś czerwieniejących? Trochę jak kanie, https://t.co/X1m95PUDFE tylko ważne nauczyć się odróżniać od plamistych. Za to czerwone od 22.05.25 są w PL nielegalne - susz jest psychoaktywny.

1

2

0

184

JChorowski retweeted

DailyPapers

@HuggingPapers

8 months ago

A missing link between Transformers and the brain? 🧠 Dragon Hatchling (BDH) is a new LLM architecture based on a scale-free, biologically-inspired network of locally-interacting neuron particles. It rivals GPT2 performance, but is designed for interpretability.

HuggingPapers's tweet photo. A missing link between Transformers and the brain? 🧠

Dragon Hatchling (BDH) is a new LLM architecture based on a scale-free, biologically-inspired network of locally-interacting neuron particles. It rivals GPT2 performance, but is designed for interpretability. https://t.co/UqbuYfzvhj

14

531

64

369

67K

JChorowski retweeted

8 months ago

We launched a new post-transformer architecture, Baby Dragon Hatchling (BDH) paving the way for autonomous AI. Our paper, The Missing Link Between the Transformer and Models of the Brain, tackles key AI challenges: generalization over time, real-time learning & interpretability

8

79

43

11

23K