Shobhita Sundaram @shobsund - Twitter Profile

Pinned Tweet

4 months ago

Can a model learn to break its own reasoning plateau? In our new paper, we show that LLMs can be taught with meta-RL to generate their own "stepping stones" that kickstart learning on hard math problems (0/128 success rate) where direct RL fails. Paper 📝: https://t.co/lUlrJt6bwq Blog post 🌐: https://t.co/v1y24h1fP4 (1/n)

shobsund's tweet photo. Can a model learn to break its own reasoning plateau?

In our new paper, we show that LLMs can be taught with meta-RL to generate their own "stepping stones" that kickstart learning on hard math problems (0/128 success rate) where direct RL fails.

Paper 📝: https://t.co/lUlrJt6bwq
Blog post 🌐: https://t.co/v1y24h1fP4

(1/n)

20

678

112

529

109K

Shobhita Sundaram

@shobsund

2 days ago

Super cool work from @juliachae_ that fills an important gap in image similarity metrics!

Julia Chae @juliachae_

3 days ago

Excited to share ID-Sim, our identity-focused similarity metric, presenting at #CVPR2026 this week in Denver! 🎉 Humans are remarkably good at distinguishing highly similar objects across different contexts. We asked: can we train a metric that does the same?

2

41

10

13

6K

0

6

3

0

1K

shobsund retweeted

Charles Arnal

@arnal_charles

8 days ago

Our team at @AIatMeta is excited to announce ATLAS: one of the largest automated formalization efforts to date. ATLAS contains Lean 4 formalizations of both statements and proofs from 25+ mathematics textbooks, spanning dozens of domains, for a total of 500k lines of code. We are also releasing a flexible formalization harness and a companion paper. External contributions are welcome! Joint work spearheaded by our amazing PhD student Ahmad Rammal (@Ahmad3Rammal), together with Niket Patel (@niketnpatel ), Fabian Gloeckle (@FabianGloeckle), Amaury Hayat (@Amaury_Hayat), Remi Munos (@MunosRemi), Julia Kempe (@KempeLab), Vivien Cabannes, and myself from @AIatMeta, @NYUDataScience , and Ecole des Ponts. This is an ongoing effort; more details in the thread below. (1/9)

arnal_charles's tweet photo. Our team at @AIatMeta is excited to announce ATLAS: one of the largest automated formalization efforts to date.

ATLAS contains Lean 4 formalizations of both statements and proofs from 25+ mathematics textbooks, spanning dozens of domains, for a total of 500k lines of code. We are also releasing a flexible formalization harness and a companion paper.

External contributions are welcome!

Joint work spearheaded by our amazing PhD student Ahmad Rammal (@Ahmad3Rammal), together with Niket Patel (@niketnpatel ), Fabian Gloeckle (@FabianGloeckle), Amaury Hayat (@Amaury_Hayat), Remi Munos (@MunosRemi), Julia Kempe (@KempeLab), Vivien Cabannes, and myself from @AIatMeta, @NYUDataScience , and Ecole des Ponts. This is an ongoing effort; more details in the thread below.
(1/9)

29

426

89

278

362K

shobsund retweeted

Michael Hu @michahu8

18 days ago

What is the right data mix, and how do we find it as the data keeps changing? This is a core, unsolved problem in continual learning. To tackle it, we built a data mixing algo that works everywhere — pretraining, midtraining, instruction tuning Introducing: On-Policy Mix 🧵1/6

michahu8's tweet photo. What is the right data mix, and how do we find it as the data keeps changing?

This is a core, unsolved problem in continual learning. To tackle it, we built a data mixing algo that works everywhere — pretraining, midtraining, instruction tuning

Introducing: On-Policy Mix

🧵1/6 https://t.co/LCuNkoewVf

6

310

55

320

46K

Who to follow

Pratik Joshi

@Roprajo

Research Engineer @GoogleDeepMind | Teaching machines to code | Prev @LTIatCMU @GoogleAI, @MSFTResearch @BITSPilaniGoa

Aditi Mavalankar

@aditimavalankar

Research Scientist @DeepMind

Sharut Gupta

@sharut_gupta

PhD @MIT_CSAIL | Previously @GoogleDeepMind (Gemini), @AIatMeta | BTech @iitdelhi

shobsund retweeted

Sophie Wang @SophieLWang

24 days ago

"The Truth Lies Somewhere in the Middle (of the Generated Tokens)" In autoregressive language models, mean pooling hidden states across generation yields better representations than any token alone. project page: https://t.co/kXddYUir4k w/ @phillip_isola and @thisismyhat

9

466

68

384

50K

shobsund retweeted

Elyssa Hofgard @ElyssaHofgard

about 1 month ago

As a geometric ML researcher, I noticed pseudoscalars don’t get enough attention! Read on to see what pseudoscalars can do for you.

ElyssaHofgard's tweet photo. As a geometric ML researcher, I noticed pseudoscalars don’t get enough attention! Read on to see what pseudoscalars can do for you. https://t.co/kRSA1NewDc

1

3

2

0

400

Shobhita Sundaram

@shobsund

about 1 month ago

@varchasvee_ Yeah it was a cool finding! We didn't explicitly try that, but we do have some new ablations (that'll be in the next arxiv version) showing that adding well-posed qs w/ incorrect answers to a pool of synthetic questions w/ correct answers still improves performance.

0

1

0

15

Shobhita Sundaram

@shobsund

about 1 month ago

LLMs can learn to self-generate curricula for hard problems that they can't yet solve! Using meta-RL, with rewards grounded in learning progress, models produce their own stepping stones that kickstart learning on hard problems where direct RL plateaus. Poster at the ICLR RSI workshop today!

Shobhita Sundaram

@shobsund

4 months ago

Can a model learn to break its own reasoning plateau? In our new paper, we show that LLMs can be taught with meta-RL to generate their own "stepping stones" that kickstart learning on hard math problems (0/128 success rate) where direct RL fails. Paper 📝: https://t.co/lUlrJt6bwq Blog post 🌐: https://t.co/v1y24h1fP4 (1/n)

20

678

112

529

109K

1

170

25

149

17K

shobsund retweeted

Julia Balla @julballa

about 1 month ago

New blogpost on tokenizing non-sequential data! Language has sequential structure, which gave rise to the next-token prediction paradigm of LLMs. But we increasingly use LLMs for data without inherent order (e.g. images, molecules, sets). What does “next token” mean here? (1/7)

julballa's tweet photo. New blogpost on tokenizing non-sequential data!

Language has sequential structure, which gave rise to the next-token prediction paradigm of LLMs. But we increasingly use LLMs for data without inherent order (e.g. images, molecules, sets). What does “next token” mean here?
(1/7) https://t.co/qdWJS7Fn8a

6

273

28

283

27K

shobsund retweeted

Cansu Sancaktar @CcansuSancaktar

3 months ago

Introducing GASP😮: Guided Asymmetric Self-Play for Coding LLMs We address the goal-agnostic behavior of current asymmetric self-play methods. Key idea: guide the teacher with hard real-data goalposts; first an easier lemma, then a harder lift from the lemma as stepping stones 🧵

1

71

11

37

12K

shobsund retweeted

Yulu Gan

@yule_gan

3 months ago

Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: https://t.co/rFJz2kVEOA Code: https://t.co/HAmonfpXIA Website: https://t.co/QZ6AMIsKCw

yule_gan's tweet photo. Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt.

To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs.

What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets.

Paper: https://t.co/rFJz2kVEOA
Code: https://t.co/HAmonfpXIA
Website: https://t.co/QZ6AMIsKCw

89

3K

432

3K

697K

shobsund retweeted

Seungwook Han

@seungwookh

3 months ago

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: https://t.co/Pni0RsIcxL (1/n)

seungwookh's tweet photo. Can language models learn useful priors without ever seeing language?

We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning.

Surprisingly, it even beats pre-pre-training on natural text!

Blog: https://t.co/Pni0RsIcxL

(1/n)

47

2K

259

1K

255K

shobsund retweeted

Sharut Gupta @sharut_gupta

3 months ago

[1/n] Do distinct large models admit a simple map that aligns their embedding spaces? We show that across multimodal contrastive models—trained on different data and architectures—an orthogonal map aligns image embeddings. Strikingly, the same map also aligns text embeddings.

12

439

62

363

37K

shobsund retweeted

Nihal Nayak @nihalcanrun

3 months ago

Targeted instruction tuning for LLMs involves selecting a subset of instructions from a candidate pool using a small query set from target tasks. Despite growing interest, we still lack guidance on what to select. Our new preprint brings clarity to this space (thread 👇).

2

22

8

5

3K

shobsund retweeted

Julia Kempe

@KempeLab

4 months ago

1/ #1stProof : Announcing our attempt at Problem 10. Joint with @scottnarmstrong @MunosRemi

10

152

22

70

23K

shobsund retweeted

NYU Center for Data Science

@NYUDataScience

4 months ago

CDS/Courant Prof @KempeLab has recorded two talks with the Physics of Learning and Neural Computation, a Simons collaboration. She discussed how RL post-training shapes LLM reasoning but may hit ceilings like diversity collapse and low sample efficiency. https://t.co/7H5dQHYWzR

1

19

5

6

2K

Shobhita Sundaram

@shobsund

4 months ago

@goyalsachin007 Congrats Sachin!!

0

145

shobsund retweeted

Sharut Gupta @sharut_gupta

4 months ago

1/n Can LLMs learn to reason on hard benchmarks like AIME and GPQA purely through context, without SFT, RL, or any weight updates? Turns out… Yes! And it can have strong performance while being highly efficient Paper: https://t.co/mEoaIst6cX Blog: https://t.co/lZli7qY4Jz

sharut_gupta's tweet photo. 1/n Can LLMs learn to reason on hard benchmarks like AIME and GPQA purely through context, without SFT, RL, or any weight updates?

Turns out… Yes! And it can have strong performance while being highly efficient

Paper: https://t.co/mEoaIst6cX
Blog: https://t.co/lZli7qY4Jz https://t.co/eOO3Jb6vfm

4

204

35

163

18K

shobsund retweeted

Lucas Beyer (bl16)

@giffmana

4 months ago

Another nice work with student-teacher setup, but this time with meta learning setup: teacher generates q/a pairs for student, and gets reward depending on student's improvement after training on those qa.

3

296

26

198

26K

Shobhita Sundaram

@shobsund

4 months ago

@inventcures @karpathy @paraschopra @tokenbender @o_v_shake Thanks! Yeah these ideas around making data/tasks more model-learnable were definitely an inspiration! In our case we have the model discover those tasks for itself based on what gives the best learning signal.

0

2

0

1

125

Shobhita Sundaram

@shobsund

4 months ago

@IdanShenfeld Awesome work Idan! Love the idea, very elegant.

0

5

0

1K

Shobhita Sundaram

@shobsund

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users