Vasseur JP @jpvasseur - Twitter Profile

jpvasseur retweeted

Fei-Fei Li

@drfeifei

16 days ago

https://t.co/Kt50ttQRMJ

170

5K

973

6K

997K

jpvasseur retweeted

Andrew Lampinen @AndrewLampinen

24 days ago

We've updated the preprint of our Naturalistic Computational Cognitive Science paper — we've clarified and streamlined the arguments, and expanded examples where we see increasing naturalism already yielding new theoretical insights, from RL to perceptual neuroscience. 1/4

AndrewLampinen's tweet photo. We've updated the preprint of our Naturalistic Computational Cognitive Science paper — we've clarified and streamlined the arguments, and expanded examples where we see increasing naturalism already yielding new theoretical insights, from RL to perceptual neuroscience. 1/4 https://t.co/NIhjom8A6K

2

157

29

114

11K

jpvasseur retweeted

MONTREAL.AI

@Montreal_AI

27 days ago

The next clue in AI reasoning: answers may be attractors. A new paper from Benhao Huang, Zhengyang Geng, and Zico Kolter introduces Equilibrium Reasoners (EqR) — a sharp mechanistic view of test-time scaling in latent reasoning models. The core idea is simple, but deep: Reasoning is not only generation. Reasoning can be convergence. EqR repeatedly updates a latent state. The authors hypothesize that generalizable reasoning emerges when training shapes the model’s latent dynamics so that stable attractors correspond to valid solutions. In other words, the answer is not merely “produced.” It is reached. This matters because test-time compute only helps when the model’s internal dynamics know how to use it. More iterations can improve reasoning — or make it worse — depending on whether the trajectory moves toward a solution-aligned attractor or falls into a spurious one. EqR scales along two axes: Depth: run more iterations so a trajectory can settle. Breadth: run multiple stochastic trajectories from different initializations and select/aggregate the ones that converge best. The first-page figure captures the punchline beautifully: training is capped at 16 iterations, yet the learned dynamics extrapolate beyond 1,024 iterations at test time. As fixed-point residual falls, accuracy rises. On Sudoku-Extreme, the paper reports a jump from 2.6% exact accuracy for feedforward models to over 99% with scalable latent reasoning — equivalent to unrolling up to ~40,000 layers. On Maze, EqR reaches 93.0%. But the benchmark is not the most interesting part. The most interesting part is the lens: Correct answers must become stable. They must be reachable. And convergence itself can become a signal. That gives the field a more precise language for test-time compute than “let the model think longer.” Not longer text. Not an external verifier. Not task-specific search priors. A learned attractor landscape. This feels important because modern AI is moving from static inference toward adaptive computation. The question is no longer only “how much compute should we spend?” It is: What internal dynamics make extra compute useful? Full credit to the authors: Benhao Huang, Zhengyang Geng, Zico Kolter. Paper: Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning https://t.co/HEfsBo8Np2 I’m attaching the first page because Figure 1 is worth studying closely. The future of reasoning may not only be models that generate better answers. It may be models whose internal states learn where correct answers live — and how to converge there. #AIResearch #MachineLearning #Reasoning #TestTimeCompute #DynamicalSystems #ArtificialIntelligence

Montreal_AI's tweet photo. The next clue in AI reasoning:

answers may be attractors.

A new paper from Benhao Huang, Zhengyang Geng, and Zico Kolter introduces Equilibrium Reasoners (EqR) — a sharp mechanistic view of test-time scaling in latent reasoning models.

The core idea is simple, but deep:

Reasoning is not only generation.
Reasoning can be convergence.

EqR repeatedly updates a latent state. The authors hypothesize that generalizable reasoning emerges when training shapes the model’s latent dynamics so that stable attractors correspond to valid solutions.

In other words, the answer is not merely “produced.”

It is reached.

This matters because test-time compute only helps when the model’s internal dynamics know how to use it. More iterations can improve reasoning — or make it worse — depending on whether the trajectory moves toward a solution-aligned attractor or falls into a spurious one.

EqR scales along two axes:

Depth: run more iterations so a trajectory can settle.

Breadth: run multiple stochastic trajectories from different initializations and select/aggregate the ones that converge best.

The first-page figure captures the punchline beautifully: training is capped at 16 iterations, yet the learned dynamics extrapolate beyond 1,024 iterations at test time. As fixed-point residual falls, accuracy rises.

On Sudoku-Extreme, the paper reports a jump from 2.6% exact accuracy for feedforward models to over 99% with scalable latent reasoning — equivalent to unrolling up to ~40,000 layers. On Maze, EqR reaches 93.0%.

But the benchmark is not the most interesting part.

The most interesting part is the lens:

Correct answers must become stable.
They must be reachable.
And convergence itself can become a signal.

That gives the field a more precise language for test-time compute than “let the model think longer.”

Not longer text.
Not an external verifier.
Not task-specific search priors.

A learned attractor landscape.

This feels important because modern AI is moving from static inference toward adaptive computation. The question is no longer only “how much compute should we spend?”

It is:

What internal dynamics make extra compute useful?

Full credit to the authors:
Benhao Huang, Zhengyang Geng, Zico Kolter.

Paper:
Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
https://t.co/HEfsBo8Np2

I’m attaching the first page because Figure 1 is worth studying closely.

The future of reasoning may not only be models that generate better answers.

It may be models whose internal states learn where correct answers live — and how to converge there.

#AIResearch #MachineLearning #Reasoning #TestTimeCompute #DynamicalSystems #ArtificialIntelligence

2

57

18

37

3K

jpvasseur retweeted

Sulekha Tripathi

@sulekhat95

29 days ago

A man spends 50 years teaching at MIT. He knows his time is running out. So he records one last lecture — everything he knows, distilled into a single hour. He died 5 months later. This is that lecture. The most important hour you'll watch this week. 👇 Bookmark it for later

31

1K

372

3K

120K

Who to follow

Patrick Gargano

@PatrickGargano

Lead Content Advocate and Instructor, Learning & Certifications @Cisco :: @CiscoPress author :: #CFC #dnd #CrossFit #yoga :: tweets are my own. 🇨🇦

Evan Mintzer

@EvanMintzer

Director of Production Infrastructure / Eater of bacon and other fine meats / Proud father of 3 #CiscoChampion #TFD #XFD2 #NFD22 #NFD23 #XFD5

Jordan Villarreal

@SystemMTUOne

Senior Technical Advocate for NetBox at NetBox Labs // Just a networking guy talking about networking things. // All opinions are my own. // (He/Him)

jpvasseur retweeted

Sungjin Ahn

@SungjinAhn_

about 1 month ago

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: https://t.co/JC7EyXYc9Y 🌐 Project page: https://t.co/LRT1dQiWLZ w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

SungjinAhn_'s tweet photo. 🧠We introduce "Generative Recursive Reasoning"!

Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor.

Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.

And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x).

With only 10M params:
• Sudoku-Extreme: 97.0% (TRM 87.4%)
• ARC-AGI-1: 52.0%
• ARC-AGI-2: 11.1%
• N-Queens coverage: 90%+

📄 Paper: https://t.co/JC7EyXYc9Y
🌐 Project page: https://t.co/LRT1dQiWLZ

w/
Junyeob Baek @JunyeobB (KAIST),
Mingyu Jo @pyross0000 (KAIST),
Minsu Kim @minsuuukim (KAIST & Mila),
Mengye Ren @mengyer (NYU),
Yoshua Bengio @Yoshua_Bengio (Mila),
Sungjin Ahn @SungjinAhn_ (KAIST)

31

1K

209

1K

183K

jpvasseur retweeted

Graham Neubig

@gneubig

30 days ago

Check out our new work on examining what LLMs learn and when! We posit that LLMs have an implicit curriculum where they learn gradually more complex skills, and attempt to uncover some details of how this curriculum develops over time across model families.

3

78

14

63

12K

jpvasseur retweeted

Goodfire

@GoodfireAI

29 days ago

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

25

1K

151

767

174K

jpvasseur retweeted

Earl K. Miller @MillerLabMIT

about 1 month ago · Somerville

The influence of nonlinear resonance on human cortical oscillations https://t.co/RvOt915zN1 #neuroscience

1

64

7

38

3K

jpvasseur retweeted

Ryan Peters

@ryanpirl

about 1 month ago

Biological networks too :) Here is the neural geometry of mice navigating a figure-8 maze.

17

641

82

305

66K

jpvasseur retweeted

Atal

@ZabihullahAtal

about 1 month ago

Jensen Huang just delivered one of the most inspiring AI-era commencement speeches at Carnegie Mellon University. worth watching all 18 minutes: (or save it to watch later)

2

51

13

44

3K

jpvasseur retweeted

Rohan Paul

@rohanpaul_ai

6 months ago

LLaDA2.0 converts a normal LLM into a diffusion model that writes faster by filling many blanks together at 100B scale. Their 100B model reports 535 tokens per second, about 2.1 times faster than similar autoregressive baselines. Autoregressive models predict the next token, a small chunk of text, from the previous ones, so generation is forced step by step. Diffusion language models train on corrupted text where many tokens are masked, and they learn to recover the missing parts using both left and right context. It starts from an already trained autoregressive model and gradually changes the masking pattern, first small blocks, then whole sequences, then small blocks again. During training, it also stops the model from reading across document boundaries, which matters when many short texts are packed together. For instruction tuning, meaning training it to follow prompts, and for speed, it uses paired masks so every token gets trained, and it pushes the model to make confident guesses so many blanks can be filled at once. ---- Paper Link – arxiv. org/abs/2512.15745 Paper Title: "LLaDA2.0: Scaling Up Diffusion Language Models to 100B"

rohanpaul_ai's tweet photo. LLaDA2.0 converts a normal LLM into a diffusion model that writes faster by filling many blanks together at 100B scale.

Their 100B model reports 535 tokens per second, about 2.1 times faster than similar autoregressive baselines.

Autoregressive models predict the next token, a small chunk of text, from the previous ones, so generation is forced step by step.

Diffusion language models train on corrupted text where many tokens are masked, and they learn to recover the missing parts using both left and right context.

It starts from an already trained autoregressive model and gradually changes the masking pattern, first small blocks, then whole sequences, then small blocks again.

During training, it also stops the model from reading across document boundaries, which matters when many short texts are packed together.

For instruction tuning, meaning training it to follow prompts, and for speed, it uses paired masks so every token gets trained, and it pushes the model to make confident guesses so many blanks can be filled at once.

----

Paper Link – arxiv. org/abs/2512.15745

Paper Title: "LLaDA2.0: Scaling Up Diffusion Language Models to 100B"

6

49

6

24

12K

jpvasseur retweeted

Time Series Features @compTimeSeries

7 months ago

New preprint! (by @kieran_s_owens) Of interest to anyone who analyzes time-series data!: "Time-series dimension reduction: a comprehensive review and conceptual unification of algorithms" https://t.co/mdoiHLBjPC #timeseries #dimensionreduction #complexsystems

compTimeSeries's tweet photo. New preprint! (by @kieran_s_owens)

Of interest to anyone who analyzes time-series data!:

"Time-series dimension reduction: a comprehensive review and conceptual unification of algorithms"

https://t.co/mdoiHLBjPC

#timeseries #dimensionreduction #complexsystems https://t.co/Ss12jOja1C

1

44

16

32

3K

jpvasseur retweeted

Alessandro Crimi 🧠🧬🔬🩺( @alecrimi.bsky.social ) @Dr_Alex_Crimi

8 months ago

the first songbird #basalganglia #brain connectome: -8,500 automated neuron reconstructions -20 million synapses -16 cell types https://t.co/yTom5a1LkX

5

232

60

82

14K

jpvasseur retweeted

NVIDIA AI Infrastructure

@NVIDIAAIInfra

8 months ago

🌏 NVIDIA and @SamsungKorea announce a new state-of-the-art AI factory to transform global intelligent manufacturing. The #AIfactory with 50,000+ NVIDIA GPUs will accelerate agentic and physical AI applications for advanced chip manufacturing, mobile devices, and robotics. Learn more ➡️ https://t.co/ABl9SqqiP3 #APEC2025

NVIDIAAIInfra's tweet photo. 🌏 NVIDIA and @SamsungKorea announce a new state-of-the-art AI factory to transform global intelligent manufacturing.

The #AIfactory with 50,000+ NVIDIA GPUs will accelerate agentic and physical AI applications for advanced chip manufacturing, mobile devices, and robotics.

Learn more ➡️ https://t.co/ABl9SqqiP3
#APEC2025

18

921

167

81

50K

jpvasseur retweeted

The Lancet

@TheLancet

8 months ago

Antidepressants can affect the normal or proper functioning of the body's organs. However, the degree to which these physiological effects occur in treatment with various antidepressants is unclear. On the cover, a new study compared and ranked antidepressants based on physiological side-effects. Read this & more: https://t.co/W6uiv00teO

TheLancet's tweet photo. Antidepressants can affect the normal or proper functioning of the body's organs. However, the degree to which these physiological effects occur in treatment with various antidepressants is unclear.

On the cover, a new study compared and ranked antidepressants based on physiological side-effects.

Read this & more: https://t.co/W6uiv00teO

14

522

172

306

112K

jpvasseur retweeted

Valeriy M., PhD, MBA, CQF

@predict_addict

8 months ago

The paper we have been waiting for. #KAN

5

380

58

322

20K

jpvasseur retweeted

William A. Wallace, Ph.D.

@WilliamWallace

8 months ago

🧠 65-Hour Live Imaging: Hippocampal Neurons Building Circuits Real-time dendritic growth & synaptic remodeling in the rat hippocampus; important to memory & plasticity. 🔬 Continuous multi-day view of neural adaptation. Credit: Louis Romet & Dr. C. Leterrier

23

1K

285

364

49K

jpvasseur retweeted

Rosinality @rosinality

8 months ago

Training a critique model to generate natural language feedback. The reward is whether the policy can generate the correct answer given feedback but this is not enough due to the critique model's lack of ability to discriminate incorrect answers. Thus the critique model is first trained to discriminate them.

rosinality's tweet photo. Training a critique model to generate natural language feedback. The reward is whether the policy can generate the correct answer given feedback but this is not enough due to the critique model's lack of ability to discriminate incorrect answers. Thus the critique model is first trained to discriminate them.

2

81

7

49

4K

jpvasseur retweeted

peter sterling @whatishealth21

8 months ago

Sensational paper offering a new way to view the brain from the core outward. Don't miss! It’s not the thought that counts: Allostasis at the core of brain function: Neuron https://t.co/821LYE9weg

2

109

13

86

9K

Vasseur JP

@jpvasseur

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users