Sainbayar Sukhbaatar

about 1 month ago

DeepSeek-V4 uses our Hash routing approach developed back in 2021 -- see screenshot of their tech report! (Looks like a great model, congrats!) Bonus note: our same blogpost (& paper) back in 2021 also introduced 'looped transformers', but we called that staircase & ladder (see screenshot): https://t.co/widkeEXz56 https://t.co/PQLdPKg9PS

jaseweston's tweet photo. DeepSeek-V4 uses our Hash routing approach developed back in 2021 -- see screenshot of their tech report! (Looks like a great model, congrats!)

Bonus note: our same blogpost (& paper) back in 2021 also introduced 'looped transformers', but we called that staircase & ladder (see screenshot): https://t.co/widkeEXz56

https://t.co/PQLdPKg9PS

0

450

38

164

32K

3 months ago

@goodfellow_ian @daniel_rossett Glad to hear that you've recovered!

0

434

tesatory retweeted

👨‍💻Senior Staff Software Engineer at Google DeepMind 💯Design/Build Software, AI & everything... :)

4 months ago

Self-Improving Pretraining We've updated our results given feedback: - larger 8B baseline to match reward model size - cross-task evals given different RM objectives Overall, we see clear wins

jaseweston's tweet photo. Self-Improving Pretraining
We've updated our results given feedback:
- larger 8B baseline to match reward model size
- cross-task evals given different RM objectives
Overall, we see clear wins https://t.co/WcE7DChrEr

4

169

24

114

9K

Who to follow

Battulga B.

@battulga11

Tsendsuren

@TsendeeMTS

Research scientist at Google DeepMind | previously at Microsoft Research and Postdoc at UMass. Views are my own. Most tweets in Mongolian 🇲🇳.

Antonin Schrab

@AntoninSchrab

Doctor in Foundational AI @ai_ucl & @GatsbyUCL. Kernel methods, hypothesis testing, generative models.

4 months ago

We also have a postdoc position if that's what you are looking for

5 months ago

Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: https://t.co/dWtpz7rttT Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): https://t.co/XPwbsuCUI6 SPICE (Self-Play in Corpus Environments): https://t.co/47BarIr0uM Self-Challenging Agents: https://t.co/qgDLmchn8X RL from Human Interaction: https://t.co/wmC2fVByp2 AggLM (parallel aggregation): https://t.co/Fg0E31aOIy StepWiser (CoT-PRM RL): https://t.co/QbfBVYx522 DARLING (diversity-trained RL): https://t.co/J9ZSs8GVyX J1 (RL-trained LLM-as-Judge): https://t.co/yG6xAPaNJ3 CoT-Self-Instruct: https://t.co/dHMYRxtv5h Multi-Token Attention: https://t.co/4kfUe8KozT

10

262

44

166

33K

0

11

0

3

2K

4 months ago

Our team is hiring! If you like to work on cool research projects, please apply :)

4 months ago

Our team in FAIR at Meta is hiring a (full-time) researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM) for self-improvement & co-improvement. Apply here: https://t.co/Vukp3u8rfu Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): https://t.co/XPwbsuCmSy SPICE (Self-Play in Corpus Environments): https://t.co/47BarIqsFe Self-Challenging Agents: https://t.co/qgDLmcgPjp RL from Human Interaction: https://t.co/wmC2fVB0zu AggLM (parallel aggregation): https://t.co/Fg0E31agT0 StepWiser (CoT-PRM RL): https://t.co/QbfBVYwxcu DARLING (diversity-trained RL): https://t.co/J9ZSs8GnJp J1 (RL-trained LLM-as-Judge): https://t.co/yG6xAPafTv CoT-Self-Instruct: https://t.co/dHMYRxsXfJ Multi-Token Attention: https://t.co/4kfUe8JQKl

10

353

33

302

58K

6

151

9

82

20K

tesatory retweeted

5 months ago

Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: https://t.co/dWtpz7rttT Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): https://t.co/XPwbsuCUI6 SPICE (Self-Play in Corpus Environments): https://t.co/47BarIr0uM Self-Challenging Agents: https://t.co/qgDLmchn8X RL from Human Interaction: https://t.co/wmC2fVByp2 AggLM (parallel aggregation): https://t.co/Fg0E31aOIy StepWiser (CoT-PRM RL): https://t.co/QbfBVYx522 DARLING (diversity-trained RL): https://t.co/J9ZSs8GVyX J1 (RL-trained LLM-as-Judge): https://t.co/yG6xAPaNJ3 CoT-Self-Instruct: https://t.co/dHMYRxtv5h Multi-Token Attention: https://t.co/4kfUe8KozT

10

262

44

166

33K

5 months ago

If you are a PhD student in Berkeley or one of these universities, you can apply to our mentorship program and do research with us! The deadline is this Friday though https://t.co/ISfdqvGwlS

0

33

3

19

4K

tesatory retweeted

6 months ago

Our co-improvement position paper is now on arXiv! (We've updated it, covering more existing work.) 📝: https://t.co/xnxWYoMNP7 After >27 years of research, my first position paper! Short 🧵 (1/5) follows 👇 Synopsis: it's about building AI that collaborates on AI research *with us* to solve AI faster, and to help fix the alignment problem together. How? Build the AI with those collab skills (i.e., we create benchmarks! training data! methods! etc. for that). I've been personally inspired by @Yoshua_Bengio's recent talks on safety & AI research, and also from seeing Nicholas Carlini's COLM keynote where he said we researchers can all do our bit to help (paraphrased). So – hope this helps! 🙏

jaseweston's tweet photo. Our co-improvement position paper is now on arXiv!
(We've updated it, covering more existing work.)
📝: https://t.co/xnxWYoMNP7

After >27 years of research, my first position paper!

Short 🧵 (1/5) follows 👇

Synopsis: it's about building AI that collaborates on AI research *with us* to solve AI faster, and to help fix the alignment problem together.

How? Build the AI with those collab skills (i.e., we create benchmarks! training data! methods! etc. for that).

I've been personally inspired by @Yoshua_Bengio's recent talks on safety & AI research, and also from seeing Nicholas Carlini's COLM keynote where he said we researchers can all do our bit to help (paraphrased). So – hope this helps! 🙏

7

244

40

142

28K

tesatory retweeted

Rimsha Bhardwaj

@heyrimsha

6 months ago

Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.

heyrimsha's tweet photo. Holy shit… Meta might’ve just solved self-improving AI 🤯

Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground.

Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source.

They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations.

The results are nuts:

+9.1% on reasoning benchmarks with Qwen3-4B
+11.9% with OctoThinker-8B
and it beats every prior self-play method like R-Zero and Absolute Zero.

This flips the script on AI self-improvement.

Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence.

If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.

39

474

78

444

32K

tesatory retweeted

6 months ago

🤝 New Position Paper !!👤🔄🤖 @j_foerst and I wrote a position piece on what we think is the path to safer superintelligence: co-improvement. Everyone is focused on self-improving AI, but (1) we don't know how to do it yet, and (2) it might be misaligned with humans. Co-improvement: instead, build AI that collaborates *with us* to solve AI faster, and to help fix the alignment problem together. More details in the paper! Read it here: 📝:https://t.co/peiPnLHHXG

jaseweston's tweet photo. 🤝 New Position Paper !!👤🔄🤖
@j_foerst and I wrote a position piece on what we think is the path to safer superintelligence: co-improvement.

Everyone is focused on self-improving AI, but (1) we don't know how to do it yet, and (2) it might be misaligned with humans.

Co-improvement: instead, build AI that collaborates *with us* to solve AI faster, and to help fix the alignment problem together. More details in the paper!

Read it here:
📝:https://t.co/peiPnLHHXG

26

504

95

322

85K

Kosta Derpanis (sabbatical in Zurich)

6 months ago

Yes I remember this from 10 years ago. My answers were not that great because I didn't get any sleep from the excitement. But it's interesting there was a question about scaling attention in a sub-linear way, which still is an important question and not fully answered.

@CSProfKGD

over 5 years ago

The last thing you ever want to hear at the end of your talk

27

850

37

183

0

29

2

9

8K

7 months ago

@alexrives Congrats for the team!

0

1

0

303

tesatory retweeted

7 months ago

🌶️SPICE: Self-Play in Corpus Environments🌶️ 📝: https://t.co/QxEd13QEmu - Challenger creates tasks based on *corpora* - Reasoner solves them - Both trained together ⚔️ -> automatic curriculum! 🔥 Outperforms standard (ungrounded) self-play Grounding fixes hallucination & lack of diversity 🧵1/6

jaseweston's tweet photo. 🌶️SPICE: Self-Play in Corpus Environments🌶️
📝: https://t.co/QxEd13QEmu
- Challenger creates tasks based on *corpora*
- Reasoner solves them
- Both trained together ⚔️ -> automatic curriculum!
🔥 Outperforms standard (ungrounded) self-play
Grounding fixes hallucination & lack of diversity
🧵1/6

8

328

51

254

80K

tesatory retweeted

8 months ago

Was super fun to organize this workshop!! Thanks everyone: speakers, panelists, audience. https://t.co/ccZzIXFgTY

5

140

13

38

21K

tesatory retweeted

Kyunghyun Cho

@kchonyc

8 months ago

.@tesatory hasn’t aged since RAM’15! is that the magic of attention and memory? #COLM2025

0

11

2

0

3K

8 months ago

@QuackerEnte Not really because all we do is add a convolution operation without changing dimensions of things. So more compute, but same memory usage

1

0

29

8 months ago

Heading to COLM! Presenting two papers: Multi-Token Attention for augmenting softmax attention for more precision, and COCONUT 🥥 for continuous CoT reasoning. Oh also speaking at RAM2 🐏 workshop about memory 🧠

1

24

2

4

2K

8 months ago

@QuackerEnte Each attention weight is conditioned on only one key and one query vector. Our method makes it possible to condition on multiple vectors, so it can be more fine-grained and information rich

1

0

48

tesatory retweeted