Scaling laws describe how loss changes with scale. Do neurons inside models change predictably too?
We study vision and language models up to 30B params and find systematic scaling in neuron universality, specialization, and selectivity.
Paper+code: https://t.co/1f1mQGnnZ4
1/n
Can coding agents stay coherent over a 1 billion token budget?
Can they build Slack from scratch?
Rewrite a JAX codebase in PyTorch?
Build a C compiler in Rust?
Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.
super excited to share our latest work! are we really tilting? 🤨
tldr: reward guidance for flows and diffusions is supposed to sample from the reward-tilted distribution. we show it doesn’t 😰 and how to (mostly) fix it ✨
plus lots of fun images!! 🖼️
collaboration with the awesome @nmboffi
website: https://t.co/nvOaAiGYq1
paper: https://t.co/EtkeyiuX7s
code: https://t.co/V3Bi4IVPbf
Camera pose matters for video understanding!
Today's MLLMs excel at recognizing activities, but still struggle with the underlying space and ego/object dynamics in video. We trace this gap to a missing piece: camera pose.
Introducing Cambrian-P: a multimodal LLM natively grounded in camera pose. (1/n)
One of the hottest terms in AI right now is "On-policy distillation".
It is a post-training technique in which a student model, typically an LLM, samples from its current policy and receives a teacher signal for on-policy states. It combines the dense supervision of distillation with the locality of online RL.
Now a method on PapersWithCode!
Find all 183 papers that cite it, and more here: https://t.co/NIsUjyU3UP
I trained an autoencoder that reconstructs images with zero reconstruction loss.
No MSE. No image space supervision.
The only signal: "According to you, does your output look like your input through your own eyes?"
It works.
Blog link, demo and summary 👇
𝗧𝗵𝗲 𝗿𝗲𝗰𝗼𝗿𝗱𝗶𝗻𝗴 𝗼𝗳 𝗟𝘂𝗰𝗮𝘀 𝗕𝗲𝘆𝗲𝗿'𝘀 (@giffmana) 𝗹𝗲𝗰𝘁𝘂𝗿𝗲 𝗮𝘁 @ETH 𝗶𝘀 𝗻𝗼𝘄 𝗹𝗶𝘃𝗲 𝗼𝗻 𝗬𝗼𝘂𝗧𝘂𝗯𝗲 𝗳𝗼𝗿 𝗲𝘃𝗲𝗿𝘆𝗼𝗻𝗲 𝘄𝗵𝗼 𝗰𝗼𝘂𝗹𝗱𝗻'𝘁 𝗷𝗼𝗶𝗻 𝘂𝘀 𝗶𝗻 𝗽𝗲𝗿𝘀𝗼𝗻!
This past Monday, we had the pleasure of hosting Lucas (@Meta@AIatMeta Superintelligence Labs) for our "Robot Learning: From Fundamentals to Foundation Models" course. He joined us to talk about: "𝗩𝗶𝘀𝗶𝗼𝗻 𝗶𝗻 𝘁𝗵𝗲 𝗔𝗴𝗲 𝗼𝗳 𝗟𝗟𝗠𝘀".
Drawing from a remarkable track record in computer vision and multimodal AI (𝗩𝗶𝗧, 𝗦𝗶𝗴𝗟𝗜𝗣, 𝗣𝗮𝗹𝗶𝗚𝗲𝗺𝗺𝗮) 🧠, Lucas delivered a masterclass on the frontier of multimodal foundation model training: from pre-training to post-training, where the field stands today, and what comes next 🚀
📽️ YouTube Recording: https://t.co/wNz1NwYkvb
📚 Course Website: https://t.co/DoQUYy3MjB
🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them.
We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute?
⤵️ Pedagogical RL
Test-time scaling, reasoning, and generally search-like processes clearly drive significant gains in LLMs. Largely owed to the structure of language. One would think the same could apply to non-linguistic domains, like image generation, but that obviously depends on whether the structure of the domain's representation lends itself to search.
1D ordered tokens (e.g., image FlexTok, video FlexTok) seem like a natural fit since they enable a step-by-step coarse-to-fine generation. We investigated that and found they indeed enable search and scale far better with test-time compute than 2D grids. See the visuals on the webpage. Appearing in @icmlconf 2026.
🔗 https://t.co/yOFqeIJrEz
📄 https://t.co/WFZCihp1m4,
Introducing Flux Matching, a generative modeling paradigm that generalizes diffusion models to vector fields that need not be the score function.
Enables structural priors in the dynamics, faster sampling, interpretable generation, and more!
w/ @StefanoErmon@Xiaojie_Qiu 🧵⤵️
🧵 1/11 Everyone's doing on-policy distillation now (Qwen3, Deepseek V4, GLM-5).
But here's what nobody's asking: at any given token or for a question and a teacher, when does the teacher's guidance actually help, and when does it quietly make things worse?
We found a way to answer this. No training needed!
RLM arXiv paper update: depth>1 results, more comparisons, more training, and more error analysis!
We add depth=2/3 experiments, where the RLM now has access to recursive RLM calls. This is also a feature of the open source `rlm` repo as well. We observe significant performance gains on OOLONG-Pairs and gains on all other benchmarks!
We also include various OpenCode and Claude Code comparisons now per popular request.
We add a length generalization experiment on MRCRv2 to show more promising training results, add a small prompting case study on OOLONG, and update the error analysis section to discuss the effect of syntax errors, decomposition mistakes, and general observations from the RLM trajectories.
The appendix is now also updated with several new experiments and plots!
"The Truth Lies Somewhere in the Middle (of the Generated Tokens)"
In autoregressive language models, mean pooling hidden states across generation yields better representations than any token alone.
project page: https://t.co/kXddYUir4k
w/ @phillip_isola and @thisismyhat
Reproducing all of Schmidhuber’s papers (1990-2025) using an AI coding assistant.
Cool project by @yaroslavvb! It even reproduced the “World Models” paper by me and @SchmidhuberAI with a toy env, with a full VAE + RNN world model implementation.
Project: https://t.co/sgQG5umNEm
Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo.
Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm.
https://t.co/1ZaIneleuW
Cool paper from Meta suggesting that future MLLMs will be Native Multimodal Models (NMM), hence no vision encoders anymore
But I disagree
I actually think we'll go in the other direction (what? more encoders? yes! read on...)
All you need to know about the future of MLLMs 🧵
My first blog post in over a year is a deep dive on flow maps🗺️, or how to learn the integral of a diffusion model to enable faster sampling and several other cool tricks.
It's the longest one yet👀 Let me know what you think!
https://t.co/O8bBGZ9qjC