We've updated the preprint of our Naturalistic Computational Cognitive Science paper — we've clarified and streamlined the arguments, and expanded examples where we see increasing naturalism already yielding new theoretical insights, from RL to perceptual neuroscience. 1/4
The next clue in AI reasoning:
answers may be attractors.
A new paper from Benhao Huang, Zhengyang Geng, and Zico Kolter introduces Equilibrium Reasoners (EqR) — a sharp mechanistic view of test-time scaling in latent reasoning models.
The core idea is simple, but deep:
Reasoning is not only generation.
Reasoning can be convergence.
EqR repeatedly updates a latent state. The authors hypothesize that generalizable reasoning emerges when training shapes the model’s latent dynamics so that stable attractors correspond to valid solutions.
In other words, the answer is not merely “produced.”
It is reached.
This matters because test-time compute only helps when the model’s internal dynamics know how to use it. More iterations can improve reasoning — or make it worse — depending on whether the trajectory moves toward a solution-aligned attractor or falls into a spurious one.
EqR scales along two axes:
Depth: run more iterations so a trajectory can settle.
Breadth: run multiple stochastic trajectories from different initializations and select/aggregate the ones that converge best.
The first-page figure captures the punchline beautifully: training is capped at 16 iterations, yet the learned dynamics extrapolate beyond 1,024 iterations at test time. As fixed-point residual falls, accuracy rises.
On Sudoku-Extreme, the paper reports a jump from 2.6% exact accuracy for feedforward models to over 99% with scalable latent reasoning — equivalent to unrolling up to ~40,000 layers. On Maze, EqR reaches 93.0%.
But the benchmark is not the most interesting part.
The most interesting part is the lens:
Correct answers must become stable.
They must be reachable.
And convergence itself can become a signal.
That gives the field a more precise language for test-time compute than “let the model think longer.”
Not longer text.
Not an external verifier.
Not task-specific search priors.
A learned attractor landscape.
This feels important because modern AI is moving from static inference toward adaptive computation. The question is no longer only “how much compute should we spend?”
It is:
What internal dynamics make extra compute useful?
Full credit to the authors:
Benhao Huang, Zhengyang Geng, Zico Kolter.
Paper:
Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
https://t.co/HEfsBo8Np2
I’m attaching the first page because Figure 1 is worth studying closely.
The future of reasoning may not only be models that generate better answers.
It may be models whose internal states learn where correct answers live — and how to converge there.
#AIResearch #MachineLearning #Reasoning #TestTimeCompute #DynamicalSystems #ArtificialIntelligence
A man spends 50 years teaching at MIT.
He knows his time is running out.
So he records one last lecture — everything he knows, distilled into a single hour.
He died 5 months later.
This is that lecture.
The most important hour you'll watch this week. 👇
Bookmark it for later
🧠We introduce "Generative Recursive Reasoning"!
Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor.
Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.
And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x).
With only 10M params:
• Sudoku-Extreme: 97.0% (TRM 87.4%)
• ARC-AGI-1: 52.0%
• ARC-AGI-2: 11.1%
• N-Queens coverage: 90%+
📄 Paper: https://t.co/JC7EyXYc9Y
🌐 Project page: https://t.co/LRT1dQiWLZ
w/
Junyeob Baek @JunyeobB (KAIST),
Mingyu Jo @pyross0000 (KAIST),
Minsu Kim @minsuuukim (KAIST & Mila),
Mengye Ren @mengyer (NYU),
Yoshua Bengio @Yoshua_Bengio (Mila),
Sungjin Ahn @SungjinAhn_ (KAIST)
Check out our new work on examining what LLMs learn and when!
We posit that LLMs have an implicit curriculum where they learn gradually more complex skills, and attempt to uncover some details of how this curriculum develops over time across model families.
The most popular way to interpret AI is missing the bigger picture.
Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines.
Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)
Jensen Huang just delivered one of the most inspiring AI-era commencement speeches at Carnegie Mellon University.
worth watching all 18 minutes:
(or save it to watch later)
LLaDA2.0 converts a normal LLM into a diffusion model that writes faster by filling many blanks together at 100B scale.
Their 100B model reports 535 tokens per second, about 2.1 times faster than similar autoregressive baselines.
Autoregressive models predict the next token, a small chunk of text, from the previous ones, so generation is forced step by step.
Diffusion language models train on corrupted text where many tokens are masked, and they learn to recover the missing parts using both left and right context.
It starts from an already trained autoregressive model and gradually changes the masking pattern, first small blocks, then whole sequences, then small blocks again.
During training, it also stops the model from reading across document boundaries, which matters when many short texts are packed together.
For instruction tuning, meaning training it to follow prompts, and for speed, it uses paired masks so every token gets trained, and it pushes the model to make confident guesses so many blanks can be filled at once.
----
Paper Link – arxiv. org/abs/2512.15745
Paper Title: "LLaDA2.0: Scaling Up Diffusion Language Models to 100B"
New preprint! (by @kieran_s_owens)
Of interest to anyone who analyzes time-series data!:
"Time-series dimension reduction: a comprehensive review and conceptual unification of algorithms"
https://t.co/mdoiHLBjPC
#timeseries#dimensionreduction#complexsystems
🌏 NVIDIA and @SamsungKorea announce a new state-of-the-art AI factory to transform global intelligent manufacturing.
The #AIfactory with 50,000+ NVIDIA GPUs will accelerate agentic and physical AI applications for advanced chip manufacturing, mobile devices, and robotics.
Learn more ➡️ https://t.co/ABl9SqqiP3
#APEC2025
Antidepressants can affect the normal or proper functioning of the body's organs. However, the degree to which these physiological effects occur in treatment with various antidepressants is unclear.
On the cover, a new study compared and ranked antidepressants based on physiological side-effects.
Read this & more: https://t.co/W6uiv00teO
🧠 65-Hour Live Imaging: Hippocampal Neurons Building Circuits
Real-time dendritic growth & synaptic remodeling in the rat hippocampus; important to memory & plasticity.
🔬 Continuous multi-day view of neural adaptation.
Credit: Louis Romet & Dr. C. Leterrier
Training a critique model to generate natural language feedback. The reward is whether the policy can generate the correct answer given feedback but this is not enough due to the critique model's lack of ability to discriminate incorrect answers. Thus the critique model is first trained to discriminate them.
Sensational paper offering a new way to view the brain from the core outward. Don't miss!
It’s not the thought that counts: Allostasis at the core of brain function: Neuron https://t.co/821LYE9weg