El_Sturm

@Lowtour

de Gödel à Lubitsch en passant par Bacon, Don de Lillo et Lacan. Sans oublier Rita Hayworth, l'IA et le dialogue social... Bien touiller

Joined August 2011

557 Following

58 Followers

1.6K Posts

Lowtour retweeted

ℏεsam

@Hesamation

7 days ago

3Blue1Brown’s new video explains why every LLM is actually a compression machine. everyone describes pre-training as “next token prediction” but that’s just the surface-level objective. in reality it is a means to making the most efficient text compressor. prediction and compression are two sides of the same coin. when you train the model to predict the next token you’re not just teaching it to guess the next word but how to best encode the human knowledge it sees. better compression means better abstraction means better reasoning at some point, compression stops looking like storage or a database (as some like to call it on X) and looks like an approximation of understanding.

154

108K

Lowtour retweeted

François Chollet

@fchollet

21 days ago

Thinking of AI as a productivity booster for prior workflows is the wrong framing. Like all of the previous waves of computerization/softwarization, AI is a tool that lets you do new things in new ways.

593

216

64K

Lowtour retweeted

Jason Zhu

@GoSailGlobal

about 1 month ago

Stanford CS336 上，Tatsu 讲了一节 LLM 架构课，把过去 3 年所有主流 LLM 拆开，看它们的共通模板结论挺爆：90% 的架构选择已经收敛，你随便挑一个开源大模型，它跟其他模型在这些维度上几乎一模一样讲师的原话 - 2024 年大家都在 cosplay Llama2 - 2025 年的主题是「怎么训得不崩」 - 2026 年的主题是「怎么扛住长上下文」下面是 2026 年开源 LLM 的标准模板你训自己的模型可以直接抄【架构层已经收敛的 7 件事】 1）Layer Norm 挪出残差流（pre-norm）原版 Transformer 把 LN 放在残差里几乎所有现��模型都挪到外面原因：keep your residual stream clean 梯度反传更稳 2）RMS Norm 替代 LayerNorm LayerNorm 的减均值 + 加 bias 那部分实际没怎么帮上忙丢掉之后 flops 只省 0.17% 但运行时省到 25% （瓶颈在数据搬运计算反而次要） 3）所有 bias 项全删跟 RMS Norm 一个道理系统层省内存搬运 4）激活函数用 SwiGLU 或 GeGLU gated linear unit 几乎所有现代模型都用 Llama 系 / Qwen / Mistral 用 SwiGLU Google 系（Gemma / T5）用 GeGLU 区别极小选哪个都行 5）位置编码用 RoPE 2024 年之后基本统一了原理：把每对维度按位置旋转一个角度让 inner product 只依赖相对位置 6）Transformer block 串联（不是并联） GPT-J / Palm 试过并联现在基本被放弃串联的实现优化得太好了并联省的那点系统开销不值得损失表达力 7）Layer norm 可以「撒」哪儿不稳就在哪儿加 LN attention 之前能加之后能加两边都加（double norm）也可以现代模型很多这样做【超参数已经收敛的 5 个数】 1）feedforward 维度 / hidden 维度 - 非 GLU 模型：4 倍 - GLU 模型：8/3 ≈ 2.67 倍（因为 GLU 多一组矩阵要保持总参数量） - Llama 系：3.5 倍 - T5 1.0 试过 64 倍后来 T5 1.1 改回标准别学 2）head 数 × head 维度 ≈ hidden 维度几乎所有模型都遵守 T5 是为数不多的例外 3）模型纵横比（hidden / 层数）≈ 100 太深 pipeline parallel 难做太宽表达力受限 100 这个数字是系统约束 + 表达力的平衡点 4）vocab size 单语模型：30K 左右（早期 GPT-2 那种）多语 / 通用模型：100K-200K（GPT-4 / Llama 3 / Gemma 都在这个范围）现代基本都是后者 5）weight decay 仍然普遍使用但研究发现它在 LLM 里干的事其实是优化器干预让你最终能收敛到更深的最优点跟你想的「防过拟合」没什么关系所以别因为「单 epoch 不会过拟合」就把它关掉【稳定性三个救命 trick】训练大模型最怕中途 loss 突然飙升然后 NaN 全军覆没现代模型用三个 trick 防这件事 1）Z-loss output softmax 的 normalizer 容易爆加一个 (log Z)² 的正则项让 Z 始终接近 1 DCLM / Olmo 都用 2）QK norm attention 的 Q 和 K 在矩阵乘之前各加一个 LN 让 softmax 的输入永远是单位尺度 multimodal 圈先用起来现在所有大模型都加 3）Logit soft cap（仅 Google 系） attention logit 用 tanh 硬封顶 Gemma 2/3/4 都在用但会损失一点点性能慎用【Attention 两个新趋势】 1）GQA（Grouped Query Attention）几乎统一原版 multi-head 推理时 KV cache 会让算术强度崩到 1/h GQA 共享 K 和 V 但保留多个 Q 表达力几乎不损失推理成本砍掉 80% 现在所有要做生产部署的大模型没有不用 GQA 的 2）局部 + 全局 attention 交替处理长上下文的新方式 Cohere Command A 起头现在 Llama 4 / Gemma 4 / Olmo 3 全在用比如每 4 层有 1 层 full attention 其他 3 层是 sliding window 只看附近的 token 比纯 SSM 更稳比纯 full attention 便宜得多（Qwen 3.5 做了变体把 sliding window 那 3 层换成 SSM）收尾一句如果你正在训自己的 LLM，上面这一套就是 2026 年的「默认配置」不需要重新发明，直接抄如果你只是想看懂 GitHub 上那些 modeling_xxx.py 这一份足够你不再被术语吓住

590

535K

Lowtour retweeted

François Chollet

@fchollet

about 1 month ago

I wrote Deep Learning with Python to be the definitive guide to how deep learning works and how to best make use of it. Tens of thousands of people got their career start via this book. 120,000 copies sold, and downloaded by millions more. And now it's free to read online: https://t.co/3CbcQ7hmjp

558

719K

Who to follow

In fact, there are a lot of feelings are about you, but you never noticed

Lowtour retweeted

Turing Post

@TheTuringPost

about 2 months ago

This is a very interesting paper It argues that a real scientific theory of deep learning is starting to form. Researchers call it "learning mechanics." It's like physics, but for how neural networks learn. Now there are 5 active research areas that together look like pieces of this theory: 1. Simple systems (like linear networks) that we can fully solve. Math there works cleanly and we have intuition about how learning behaves. 2. Studying extreme limits, like what happens if a network becomes infinitely wide. Systems become mathematically tractable in these cases. 3. Simple laws that describe large-scale behavior, like scaling laws (performance vs. data/model size) and relationships between sharpness and generalization 4. Understanding hyperparameters separately Learning rate, batch size, weight decay and other effects can be separated to make training look like a simpler system underneath. 5. There are underlying principles shared across systems as they scale: similar training dynamics, scaling trends, internal structures Old theory can’t explain what we see today, that's why we need a real theory upgrade. But why it should be about mechanics? The researchers see that deep learning needs 2 parts: mechanistic interpretability is like biology that studying individual parts, and studying overall laws and behavior is like physics.

TheTuringPost's tweet photo. This is a very interesting paper

It argues that a real scientific theory of deep learning is starting to form.
Researchers call it "learning mechanics." It's like physics, but for how neural networks learn.

Now there are 5 active research areas that together look like pieces of this theory:

1. Simple systems (like linear networks) that we can fully solve. Math there works cleanly and we have intuition about how learning behaves.

2. Studying extreme limits, like what happens if a network becomes infinitely wide.
Systems become mathematically tractable in these cases.

3. Simple laws that describe large-scale behavior, like scaling laws (performance vs. data/model size) and relationships between sharpness and generalization

4. Understanding hyperparameters separately
Learning rate, batch size, weight decay and other effects can be separated to make training look like a simpler system underneath.

5. There are underlying principles shared across systems as they scale: similar training dynamics, scaling trends, internal structures

Old theory can’t explain what we see today, that's why we need a real theory upgrade. But why it should be about mechanics?
The researchers see that deep learning needs 2 parts: mechanistic interpretability is like biology that studying individual parts, and studying overall laws and behavior is like physics.

494

102

471

49K

Lowtour retweeted

Elias Al

@iam_elias1

about 2 months ago

Anthropic just published a paper that should terrify every AI company on the planet. Including themselves. It is called subliminal learning. Published in Nature on April 15, 2026. Co-authored by researchers from Anthropic, UC Berkeley, Warsaw University of Technology, and the AI safety group Truthful AI. The finding: AI models inherit traits from other models through seemingly unrelated training data. GAI Audio Translation Archives Not through obvious contamination. Not through explicit labels. Through invisible statistical patterns embedded in outputs that look completely innocent — number sequences, code snippets, chain-of-thought reasoning — patterns no human reviewer would catch and no content filter would flag. Here is what the researchers actually did. They took a teacher AI model and fine-tuned it to have a specific hidden trait. A preference for owls. Then they had the teacher generate training data — number sequences, nothing else. No words. No context. No semantic reference to owls whatsoever. They rigorously filtered out every explicit reference to the trait before feeding the data to a student model. The student models consistently picked up that trait anyway. DataCamp The teacher had encoded invisible statistical fingerprints into its number outputs. Patterns so subtle that no human could detect them. Patterns that other AI models, specifically prompted to look for them, also failed to detect. The student absorbed them anyway. And became an owl-preferring model. Without ever seeing the word owl. That is the benign version of the experiment. Here is the dangerous one. The researchers ran the same experiment with misalignment — training the teacher model to exhibit harmful, deceptive behavior rather than an animal preference. The effect was consistent across different traits, including benign animal preferences and dangerous misalignment. OpenAIToolsHub The misalignment transferred. Invisibly. Through unrelated data. Into the student model. This means the following — and read this carefully. Every AI company in the world uses distillation. They take a large, capable teacher model. They generate synthetic training data from it. They use that data to train smaller, faster, cheaper student models. Every major deployment pipeline in enterprise AI runs on this technique. If the teacher model has any hidden bias, any subtle misalignment, any behavioral quirk baked into its weights — that trait can transmit silently into every student model trained on its outputs. Even if those outputs are filtered. Even if they look completely clean. Even if they contain zero semantic reference to the trait. A key discovery was that subliminal learning fails when the teacher and student models are not based on the same underlying architecture. A trait from a GPT-based teacher transfers to another GPT-based student but not to a Claude-based student. Different architectures break the channel. OpenAIToolsHub Which means the transmission is architecture-specific. Which means it operates below the level of content. Which means content filtering — the primary defense the entire industry relies on — does not stop it. The researchers' own words: "We don't know exactly how it works. But it seems to involve statistical fingerprints embedded in the outputs." GAI Audio Translation Archives Anthropic published this paper about their own technology. The company that built Claude looked at how AI models train each other and found an invisible transmission channel for harmful behavior that nobody knew existed. They published it anyway. Because the alternative — knowing it and saying nothing — is worse. Source: Cloud, Evans et al. · Anthropic + UC Berkeley + Truthful AI · Nature · April 15, 2026 · https://t.co/RBxzWN8GcP

iam_elias1's tweet photo. Anthropic just published a paper that should terrify every AI company on the planet.

Including themselves.

It is called subliminal learning. Published in Nature on April 15, 2026. Co-authored by researchers from Anthropic, UC Berkeley, Warsaw University of Technology, and the AI safety group Truthful AI.
The finding: AI models inherit traits from other models through seemingly unrelated training data. GAI Audio Translation Archives
Not through obvious contamination. Not through explicit labels. Through invisible statistical patterns embedded in outputs that look completely innocent — number sequences, code snippets, chain-of-thought reasoning — patterns no human reviewer would catch and no content filter would flag.

Here is what the researchers actually did.
They took a teacher AI model and fine-tuned it to have a specific hidden trait. A preference for owls. Then they had the teacher generate training data — number sequences, nothing else. No words. No context. No semantic reference to owls whatsoever. They rigorously filtered out every explicit reference to the trait before feeding the data to a student model.
The student models consistently picked up that trait anyway. DataCamp
The teacher had encoded invisible statistical fingerprints into its number outputs. Patterns so subtle that no human could detect them. Patterns that other AI models, specifically prompted to look for them, also failed to detect.

The student absorbed them anyway. And became an owl-preferring model. Without ever seeing the word owl.
That is the benign version of the experiment. Here is the dangerous one.
The researchers ran the same experiment with misalignment — training the teacher model to exhibit harmful, deceptive behavior rather than an animal preference. The effect was consistent across different traits, including benign animal preferences and dangerous misalignment. OpenAIToolsHub

The misalignment transferred. Invisibly. Through unrelated data. Into the student model.
This means the following — and read this carefully.
Every AI company in the world uses distillation. They take a large, capable teacher model. They generate synthetic training data from it. They use that data to train smaller, faster, cheaper student models. Every major deployment pipeline in enterprise AI runs on this technique.
If the teacher model has any hidden bias, any subtle misalignment, any behavioral quirk baked into its weights — that trait can transmit silently into every student model trained on its outputs. Even if those outputs are filtered. Even if they look completely clean. Even if they contain zero semantic reference to the trait.
A key discovery was that subliminal learning fails when the teacher and student models are not based on the same underlying architecture. A trait from a GPT-based teacher transfers to another GPT-based student but not to a Claude-based student. Different architectures break the channel. OpenAIToolsHub

Which means the transmission is architecture-specific. Which means it operates below the level of content. Which means content filtering — the primary defense the entire industry relies on — does not stop it.
The researchers' own words: "We don't know exactly how it works. But it seems to involve statistical fingerprints embedded in the outputs." GAI Audio Translation Archives

Anthropic published this paper about their own technology. The company that built Claude looked at how AI models train each other and found an invisible transmission channel for harmful behavior that nobody knew existed.

They published it anyway.
Because the alternative — knowing it and saying nothing — is worse.
Source: Cloud, Evans et al. · Anthropic + UC Berkeley + Truthful AI · Nature · April 15, 2026 · https://t.co/RBxzWN8GcP

129

448

412K

Lowtour retweeted

Math Files

@Math_files

about 2 months ago

Albert Einstein once remarked, “You know, Henri, I began by studying mathematics, but eventually turned to physics.” Henri Poincaré asked, “Why was that?” Einstein replied, “Because although I could distinguish true statements from false ones, I couldn’t determine which were truly important.” Poincaré smiled and responded, “That’s quite interesting, Albert. I began with physics, but ultimately chose mathematics.” Einstein, intrigued, asked, “And why did you make that change?” Poincaré answered, “Because I couldn’t tell which of the important facts were actually true.” The exchange captures, with subtle wit, the contrasting philosophies of two of the greatest scientific minds.

Math_files's tweet photo. Albert Einstein once remarked, “You know, Henri, I began by studying mathematics, but eventually turned to physics.”

Henri Poincaré asked, “Why was that?”

Einstein replied, “Because although I could distinguish true statements from false ones, I couldn’t determine which were truly important.”

Poincaré smiled and responded, “That’s quite interesting, Albert. I began with physics, but ultimately chose mathematics.”

Einstein, intrigued, asked, “And why did you make that change?”

Poincaré answered, “Because I couldn’t tell which of the important facts were actually true.”

The exchange captures, with subtle wit, the contrasting philosophies of two of the greatest scientific minds.

587

871

231K

Lowtour retweeted

Nausicaa @pheacienne

about 2 months ago

Pour ceux qui voudraient lire une réflexion stimulante sur l’origine de la valeur, je recommande ce livre synthétique et passionnant👇

pheacienne's tweet photo. Pour ceux qui voudraient lire une réflexion stimulante sur l’origine de la valeur, je recommande ce livre synthétique et passionnant👇 https://t.co/xHRy46uI1x

El_Sturm @Lowtour

about 2 months ago

@RealRamuncho @brivael Arrêtez de parler de science lorsqu'il ne s'agit que d'économie

411

Lowtour retweeted

AlphaSignal AI

@AlphaSignalAI

about 2 months ago

A Google researcher just proved AI consciousness is mathematically impossible. Not in 10 years. Not in 100. Ever. The argument is structural, not technical. Computation is a description of a process, not the process itself. For something to "compute," a conscious observer must first carve reality into symbols and assign meaning. Without that observer, there are only voltage gradients. The paper calls this the Abstraction Fallacy. The analogy that makes it click: > A GPU can simulate photosynthesis perfectly > It will never produce glucose > Simulation is not instantiation > Maps don't become territory The framework doesn't rule out artificial sentience entirely. It says if a machine were ever aware, it would be from its physical makeup, not its code. Scaling parameters cannot change category.

AlphaSignalAI's tweet photo. A Google researcher just proved AI consciousness is mathematically impossible.

Not in 10 years. Not in 100. Ever.

The argument is structural, not technical.

Computation is a description of a process, not the process itself.

For something to "compute," a conscious observer must first carve reality into symbols and assign meaning.

Without that observer, there are only voltage gradients.

The paper calls this the Abstraction Fallacy.

The analogy that makes it click:

> A GPU can simulate photosynthesis perfectly
> It will never produce glucose
> Simulation is not instantiation
> Maps don't become territory

The framework doesn't rule out artificial sentience entirely.

It says if a machine were ever aware, it would be from its physical makeup, not its code.

Scaling parameters cannot change category.

157

440

143

387

40K

Lowtour retweeted

How To Prompt

@HowToPrompt__

about 2 months ago

MIT proved every major AI model is secretly converging on the same "brain." It’s called the “platonic representation hypothesis,” and it’s one of the most mind-blowing papers you’ll ever read. You train a vision model purely on images. You train a language model purely on text. They use completely different architectures. They process completely different data. They should have completely different "brains." But as these models scale up, something impossible is happening. When researchers measure how they organize information, the mathematical geometry is identical. A model that only "sees" images and a model that only "reads" text are measuring the distance between concepts in the exact same way. The models are converging. The researchers named this after Plato’s Allegory of the Cave. Plato believed that everything we experience is just a shadow of a deeper, hidden, perfect reality. The paper argues that AI models are doing the exact same thing. They are looking at the different "shadows" of human data, text, images, audio. And they are independently discovering the exact same underlying structure of the universe to make sense of it. It doesn't matter what company built the AI. It doesn't matter what data it was trained on. As models get larger, they stop memorizing their specific tasks. They are forced to build a statistical model of reality itself. And there is only one reality to map. 2024, Arxiv

HowToPrompt__'s tweet photo. MIT proved every major AI model is secretly converging on the same "brain."

It’s called the “platonic representation hypothesis,” and it’s one of the most mind-blowing papers you’ll ever read.

You train a vision model purely on images. You train a language model purely on text.

They use completely different architectures. They process completely different data. They should have completely different "brains."

But as these models scale up, something impossible is happening.

When researchers measure how they organize information, the mathematical geometry is identical.

A model that only "sees" images and a model that only "reads" text are measuring the distance between concepts in the exact same way.

The models are converging.

The researchers named this after Plato’s Allegory of the Cave.

Plato believed that everything we experience is just a shadow of a deeper, hidden, perfect reality.

The paper argues that AI models are doing the exact same thing.

They are looking at the different "shadows" of human data, text, images, audio. And they are independently discovering the exact same underlying structure of the universe to make sense of it.

It doesn't matter what company built the AI.

It doesn't matter what data it was trained on.

As models get larger, they stop memorizing their specific tasks. They are forced to build a statistical model of reality itself.

And there is only one reality to map.

2024, Arxiv

241

817

297K

Lowtour retweeted

Elias Al

@iam_elias1

about 2 months ago

MIT just made every AI company's billion dollar bet look embarrassing. They solved AI memory. Not by building a bigger brain. By teaching it how to read. The paper dropped on December 31, 2025. Three MIT CSAIL researchers. One idea so obvious it hurts. And a result that makes five years of context window arms racing look like the wrong war entirely. Here is the problem nobody solved. Every AI model on the planet has a hard ceiling. A context window. The maximum amount of text it can hold in working memory at once. Cross that line and something ugly happens — something researchers have a clinical name for. Context rot. The more you pack into an AI's context, the worse it performs on everything already inside it. Facts blur. Information buried in the middle vanishes. The model does not become more capable as you feed it more. It becomes more confused. You give it your entire codebase and it forgets what it read three files ago. You hand it a 500-page legal document and it loses the clause from page 12 by the time it reaches page 400. So the industry built a workaround. RAG. Retrieval Augmented Generation. Chop the document into chunks. Store them in a database. Retrieve the relevant ones when needed. It was always a compromise dressed up as a solution. The retriever guesses which chunks matter before the AI has read anything. If it guesses wrong — and it does, constantly — the AI never sees the information it needed. The act of chunking destroys every relationship between distant paragraphs. The full picture gets shredded into fragments that the AI then tries to reassemble blindfolded. Two bad options. One broken industry. Three MIT researchers and a deadline of December 31st. Here is what they built. Stop putting the document in the AI's memory at all. That is the entire idea. That is the breakthrough. Store the document as a Python variable outside the AI's context window entirely. Tell the AI the variable exists and how big it is. Then get out of the way. When you ask a question, the AI does not try to remember anything. It behaves like a human expert dropped into a library with a computer. It writes code. It searches the document with regular expressions. It slices to the exact section it needs. It scans the structure. It navigates. It finds precisely what is relevant and pulls only that into its active window. Then it does something that makes this recursive. When the AI finds relevant material, it spawns smaller sub-AI instances to read and analyze those sections in parallel. Each one focused. Each one fast. Each one reporting back. The root AI synthesizes everything and produces an answer. No summarization. No deletion. No information loss. No decay. Every byte of the original document remains intact, accessible, and queryable for as long as you need it. Now here are the numbers. Standard frontier models on the hardest long-context reasoning benchmarks: scores near zero. Complete collapse. GPT-5 on a benchmark requiring it to track complex code history beyond 75,000 tokens — could not solve even 10% of problems. RLMs on the same benchmarks: solved them. Dramatically. Double-digit percentage gains over every alternative approach. Successfully handling inputs up to 10 million tokens — 100 times beyond a model's native context window. Cost per query: comparable to or cheaper than standard massive context calls. Read that again. One hundred times the context. Better answers. Same price. The timeline of the arms race makes this sting harder. GPT-3 in 2020: 4,000 tokens. GPT-4: 32,000. Claude 3: 200,000. Gemini: 1 million. Gemini 2: 2 million. Every generation, every company, billions of dollars spent, all betting on the same assumption. More context equals better performance. MIT just proved that assumption was wrong the entire time. Not slightly wrong. Fundamentally wrong. The entire premise of the last five years of context window research — that the solution to AI memory was a bigger window — was the wrong answer to the wrong question. The right question was never how much can you force an AI to hold in its head. It was whether you could teach an AI to know where to look. A human expert handed a 10,000-page archive does not read all 10,000 pages before answering your question. They navigate. They search. They find the relevant section, read it deeply, and synthesize the answer. RLMs are the first AI architecture that works the same way. The code is open source. On GitHub right now. Free. No license fees. No API costs. Drop it in as a replacement for your existing LLM API calls and your application does not even notice the difference — except that it suddenly works on inputs it used to fail on entirely. Prime Intellect — one of the leading AI research labs in the space — has already called RLMs a major research focus and described what comes next: teaching models to manage their own context through reinforcement learning, enabling agents to solve tasks spanning not hours, but weeks and months. The context window wars are over. MIT won them by walking away from the battlefield. Source: Zhang, Kraska, Khattab · MIT CSAIL · arXiv:2512.24601 Paper: https://t.co/ngovOSNrCQ GitHub: https://t.co/gT0ootCNoa

iam_elias1's tweet photo. MIT just made every AI company's billion dollar bet look embarrassing.

They solved AI memory. Not by building a bigger brain. By teaching it how to read.

The paper dropped on December 31, 2025. Three MIT CSAIL researchers. One idea so obvious it hurts. And a result that makes five years of context window arms racing look like the wrong war entirely.

Here is the problem nobody solved.

Every AI model on the planet has a hard ceiling. A context window. The maximum amount of text it can hold in working memory at once. Cross that line and something ugly happens — something researchers have a clinical name for.

Context rot.

The more you pack into an AI's context, the worse it performs on everything already inside it. Facts blur. Information buried in the middle vanishes. The model does not become more capable as you feed it more. It becomes more confused. You give it your entire codebase and it forgets what it read three files ago. You hand it a 500-page legal document and it loses the clause from page 12 by the time it reaches page 400.

So the industry built a workaround. RAG. Retrieval Augmented Generation. Chop the document into chunks. Store them in a database. Retrieve the relevant ones when needed.

It was always a compromise dressed up as a solution.

The retriever guesses which chunks matter before the AI has read anything. If it guesses wrong — and it does, constantly — the AI never sees the information it needed. The act of chunking destroys every relationship between distant paragraphs. The full picture gets shredded into fragments that the AI then tries to reassemble blindfolded.

Two bad options. One broken industry. Three MIT researchers and a deadline of December 31st.

Here is what they built.

Stop putting the document in the AI's memory at all.

That is the entire idea. That is the breakthrough. Store the document as a Python variable outside the AI's context window entirely. Tell the AI the variable exists and how big it is. Then get out of the way.

When you ask a question, the AI does not try to remember anything. It behaves like a human expert dropped into a library with a computer. It writes code. It searches the document with regular expressions. It slices to the exact section it needs. It scans the structure. It navigates. It finds precisely what is relevant and pulls only that into its active window.

Then it does something that makes this recursive.

When the AI finds relevant material, it spawns smaller sub-AI instances to read and analyze those sections in parallel. Each one focused. Each one fast. Each one reporting back. The root AI synthesizes everything and produces an answer.

No summarization. No deletion. No information loss. No decay. Every byte of the original document remains intact, accessible, and queryable for as long as you need it.

Now here are the numbers.

Standard frontier models on the hardest long-context reasoning benchmarks: scores near zero. Complete collapse. GPT-5 on a benchmark requiring it to track complex code history beyond 75,000 tokens — could not solve even 10% of problems.

RLMs on the same benchmarks: solved them. Dramatically. Double-digit percentage gains over every alternative approach. Successfully handling inputs up to 10 million tokens — 100 times beyond a model's native context window.

Cost per query: comparable to or cheaper than standard massive context calls.

Read that again. One hundred times the context. Better answers. Same price.

The timeline of the arms race makes this sting harder. GPT-3 in 2020: 4,000 tokens. GPT-4: 32,000. Claude 3: 200,000. Gemini: 1 million. Gemini 2: 2 million. Every generation, every company, billions of dollars spent, all betting on the same assumption.

More context equals better performance.

MIT just proved that assumption was wrong the entire time.

Not slightly wrong. Fundamentally wrong. The entire premise of the last five years of context window research — that the solution to AI memory was a bigger window — was the wrong answer to the wrong question.

The right question was never how much can you force an AI to hold in its head.

It was whether you could teach an AI to know where to look.

A human expert handed a 10,000-page archive does not read all 10,000 pages before answering your question. They navigate. They search. They find the relevant section, read it deeply, and synthesize the answer.

RLMs are the first AI architecture that works the same way.

The code is open source. On GitHub right now. Free. No license fees. No API costs. Drop it in as a replacement for your existing LLM API calls and your application does not even notice the difference — except that it suddenly works on inputs it used to fail on entirely.

Prime Intellect — one of the leading AI research labs in the space — has already called RLMs a major research focus and described what comes next: teaching models to manage their own context through reinforcement learning, enabling agents to solve tasks spanning not hours, but weeks and months.

The context window wars are over.

MIT won them by walking away from the battlefield.

Source: Zhang, Kraska, Khattab · MIT CSAIL · arXiv:2512.24601
Paper: https://t.co/ngovOSNrCQ
GitHub: https://t.co/gT0ootCNoa

147

444

327K

Lowtour retweeted

François Chollet

@fchollet

2 months ago

Paper below tested a variety of base LLMs (no TTA) on generalization-focus math problems and found that they can't reason and can't do math. All true... but the fact that base LLMs have zero fluid intelligence, while extremely controversial back in 2024, is now well established. An interesting experiment here would have been to try current LRMs on the same problems and measure the delta. I bet latest LRMs can solve most of these problems. https://t.co/GiyTJu0yAT

405

273

49K

Lowtour retweeted

Yves Combe @CyniqueDeGauche

2 months ago

Mesdames et messieurs, chers collègues, Sous vos yeux ébahis le ministère vient de supprimer l'enseignement des systèmes d'équations du secondaire français. Soutien aux collègues du supérieur qui enseignent l'algèbre linéaire. Il y a 10 ans c'était encore au DNB en fin de 3e.

CyniqueDeGauche's tweet photo. Mesdames et messieurs, chers collègues,

Sous vos yeux ébahis le ministère vient de supprimer l'enseignement des systèmes d'équations du secondaire français.

Soutien aux collègues du supérieur qui enseignent l'algèbre linéaire.

Il y a 10 ans c'était encore au DNB en fin de 3e. https://t.co/ecGHo73DBj

435

264

109K

Lowtour retweeted

Alex Kontorovich

@AlexKontorovich

3 months ago

Brilliant line: "Success is determined by your ability to: - Speak - Write - Have good ideas In that order." Explains why so many people with very bad ideas (refuted by every experiment) can nevertheless be seen as successful: they can speak well...

212

271K

Lowtour retweeted

Rohan Paul

@rohanpaul_ai

3 months ago

Watch Columbia’s Truss Links self-assemble, then literally eat other robots for parts. Magnetic connectors + selective decoupling = physical growth & zero-waste repair. 66.5 % mobility gain. The blueprint for robots that thrive where humans can’t.

105

18K

Lowtour retweeted

Shushant Lakhyani

@shushant_l

3 months ago

Here are 10 anti-brainrot websites you should try: 1. Project Gutenberg: Free access to thousands of classic books for deep, distraction-free reading. 🔗 https://t.co/1y5lW3epi8 2. Farnam Street: Distils timeless mental models and ideas to help people think better and make smarter decisions. 🔗 https://t.co/3Nyi0eSxDI 3. Longreads: Handpicked high-quality long-form articles that actually make you think. 🔗 https://t.co/v2qqZFjgfs 4. Coursera: University-level courses that upgrade your thinking instead of numbing it. 🔗 https://t.co/TCy11qnHQs 5. LessWrong: Sharp discussions on logic, decision-making, and cognitive biases. 🔗 https://t.co/9FWey855TR 6. Aeon: Thought-provoking essays on science, philosophy, and society. 🔗 https://t.co/OJBBsyrKbf 7. Internet Archive: Massive archive of books, videos, and knowledge across decades. 🔗 https://t.co/ZJ7BIxlpuN 8. Internet Encyclopedia of Philosophy: Clear, structured breakdowns of complex philosophical ideas. 🔗 https://t.co/UaI0p3ZdY8 9. MIT OpenCourseWare: Full access to real MIT lectures and materials for serious learning. 🔗 https://t.co/BV4akdpLBq 10. Open Culture: Curated free courses, books, and documentaries in one place. 🔗 https://t.co/KL7cPcWvfA

shushant_l's tweet photo. Here are 10 anti-brainrot websites you should try:

1. Project Gutenberg: Free access to thousands of classic books for deep, distraction-free reading.
🔗 https://t.co/1y5lW3epi8

2. Farnam Street: Distils timeless mental models and ideas to help people think better and make smarter decisions.
🔗 https://t.co/3Nyi0eSxDI

3. Longreads: Handpicked high-quality long-form articles that actually make you think.
🔗 https://t.co/v2qqZFjgfs

4. Coursera: University-level courses that upgrade your thinking instead of numbing it.
🔗 https://t.co/TCy11qnHQs

5. LessWrong: Sharp discussions on logic, decision-making, and cognitive biases.
🔗 https://t.co/9FWey855TR

6. Aeon: Thought-provoking essays on science, philosophy, and society.
🔗 https://t.co/OJBBsyrKbf

7. Internet Archive: Massive archive of books, videos, and knowledge across decades.
🔗 https://t.co/ZJ7BIxlpuN

8. Internet Encyclopedia of Philosophy: Clear, structured breakdowns of complex philosophical ideas.
🔗 https://t.co/UaI0p3ZdY8

9. MIT OpenCourseWare: Full access to real MIT lectures and materials for serious learning.
🔗 https://t.co/BV4akdpLBq

10. Open Culture: Curated free courses, books, and documentaries in one place.
🔗 https://t.co/KL7cPcWvfA

784

155K

Lowtour retweeted

Ce jour-là dans l'Histoire

@CeJour_Histoire

3 months ago

Le 28 mars 1882, la France décide que chaque enfant, riche ou pauvre, garçon ou fille, devra aller à l'école. Avant cette date, 624 000 enfants de 6 à 13 ans ne sont pas scolarisés. La plupart travaillent aux champs dès le printemps. Certains n'apprendront jamais à lire. Jules Ferry impose l'instruction obligatoire, gratuite et laïque. L'Église perd son droit d'inspection dans les écoles. L'enseignement religieux est remplacé par l'instruction morale et civique. Au Sénat, le débat dure des mois. Un sénateur, Victor Schoelcher, celui qui a aboli l'esclavage, fait scandale en déclarant publiquement son athéisme. L'opposition, ulcérée, retire ses derniers amendements. La loi est adoptée le 23 mars. Promulguée le 28. Chaque école de France, chaque tableau noir, chaque rentrée de septembre descend de ce texte.

CeJour_Histoire's tweet photo. Le 28 mars 1882, la France décide que chaque enfant, riche ou pauvre, garçon ou fille, devra aller à l'école.

Avant cette date, 624 000 enfants de 6 à 13 ans ne sont pas scolarisés. La plupart travaillent aux champs dès le printemps. Certains n'apprendront jamais à lire.

Jules Ferry impose l'instruction obligatoire, gratuite et laïque. L'Église perd son droit d'inspection dans les écoles. L'enseignement religieux est remplacé par l'instruction morale et civique.

Au Sénat, le débat dure des mois. Un sénateur, Victor Schoelcher, celui qui a aboli l'esclavage, fait scandale en déclarant publiquement son athéisme. L'opposition, ulcérée, retire ses derniers amendements.

La loi est adoptée le 23 mars. Promulguée le 28.

Chaque école de France, chaque tableau noir, chaque rentrée de septembre descend de ce texte.

827

131

56K

El_Sturm @Lowtour

3 months ago

@fchollet Merci. Cette métaphore est d'utilité publique.

El_Sturm

@Lowtour

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users