Eugene Belilovsky @ebelilov - Twitter Profile

ebelilov retweeted

about 8 hours ago

🎉 Our paper "𝗙𝗿𝗼𝗺 𝗠𝗲𝗺𝗼𝗿𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘁𝗼 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 𝗜𝗻𝘁𝗲𝗿𝗳𝗲𝗿𝗲𝗻𝗰𝗲: 𝗛𝗼𝘄 𝗢𝘃𝗲𝗿𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗘𝘅𝗽𝗲𝗿𝘁𝘀 𝗛𝗮𝗿𝗺𝘀 𝗠𝗼𝗱𝗲𝗹 𝗠𝗲𝗿𝗴𝗶𝗻𝗴" was accepted at ICML 2026! 🔎 Do better expert models always lead to better merged models? Not necessarily! 📜Read the paper: https://t.co/aJN7Oi8Dw2 🧵 1/9

1

13

9

6

581

ebelilov retweeted

VAIBHAV SINGH

@VAIBHAV22155287

9 days ago

Training big models gets painful once a full replica won't fit on one accelerator. You end up with model-parallel methods or techniques like FSDP that are communication-heavy and limited in how far they parallelize. We tried a new axis that lets you split the model the way model parallelism does, but communicate gradients instead of activations. 🧵 1/N

1

12

4

11

1K

ebelilov retweeted

Julien Chaumond

@julien_c

21 days ago

now tell me What's the % of weights changed between Opus 4.7 and Opus 4.8 <1%?

28

627

3

47

159K

Eugene Belilovsky

@ebelilov

29 days ago

Check out our work on learned optimizers being presented at MLSys, I am also recruiting a student to work further in this direction (reach out by email if you have relevant experience/interest)

Benjamin Thérien @ MLSys 2026

@benjamintherien

29 days ago

This is joint work with @janson002, @QuentinAnthon15, @XiaolongH33885, @amoudgl and @ebelilov Read the paper📜: https://t.co/0KXrYjpL8R Access our efficient PyTorch implementation: https://t.co/5CnKdTHef0 https://t.co/uVte5g94zk

1

6

1

2

2K

0

4

0

1K

Who to follow

Ioannis Mitliagkas (Γιάννης Μητλιάγκας)

@bouzoukipunks

Associate prof. at the University of Montréal and Mila. Research scientist Google DeepMind. Previously Stanford; UT Austin.

Jan Leike

@janleike

AI research @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

Sarath Chandar

@apsarathchandar

Associate Professor @polymtl and @Mila_Quebec; Canada CIFAR AI Chair; Machine Learning Researcher. Pro-bono office hours: https://t.co/tK69DKRf9N?amp=1

ebelilov retweeted

Benjamin Thérien @ MLSys 2026

@benjamintherien

29 days ago

This is joint work with @janson002, @QuentinAnthon15, @XiaolongH33885, @amoudgl and @ebelilov Read the paper📜: https://t.co/0KXrYjpL8R Access our efficient PyTorch implementation: https://t.co/5CnKdTHef0 https://t.co/uVte5g94zk

1

6

1

2

2K

ebelilov retweeted

Zyphra

@ZyphraAI

about 1 month ago

We're publishing our first end-to-end benchmarks for Zyphra Inference on @AMD Instinct MI355X. Our inference optimizations strongly outperform the AMD baseline and narrows the gap between MI355X and B200 for serving Kimi K2.6, GLM 5.1, and DeepSeek V3.2 🧵

ZyphraAI's tweet photo. We're publishing our first end-to-end benchmarks for Zyphra Inference on @AMD Instinct MI355X.

Our inference optimizations strongly outperform the AMD baseline and narrows the gap between MI355X and B200 for serving Kimi K2.6, GLM 5.1, and DeepSeek V3.2 🧵 https://t.co/rDkOiFOrRz

7

265

26

59

217K

ebelilov retweeted

Keller Jordan

@kellerjordan0

about 1 month ago

Modded-NanoGPT optimization result #13: @benjamintherien has achieved a new record of 3210 steps (-15), by wrapping NorMuonH in a MuLoCo-style outer Nesterov SGD. Compared to the target loss, this result has a p-value of p=1.3e-4. Compared to result #11, it has p=0.099.

kellerjordan0's tweet photo. Modded-NanoGPT optimization result #13: @benjamintherien has achieved a new record of 3210 steps (-15), by wrapping NorMuonH in a MuLoCo-style outer Nesterov SGD.

Compared to the target loss, this result has a p-value of p=1.3e-4. Compared to result #11, it has p=0.099. https://t.co/nmExPm1v3f

3

83

11

28

8K

ebelilov retweeted

Abhinav Moudgil

@amoudgl

about 2 months ago

Heading to Rio 🇧🇷 to present our Celo line of work at #ICLR2026! Get in touch if you are curious about new avenues in neural network training or how we scaled learned optimizers from CIFAR-10 to GPT-3 🚀 Details ⬇️

1

17

5

3

2K

ebelilov retweeted

Arthur Douillard

@Ar_Douillard

about 2 months ago

The DiLoCo team at Google DeepMind and Google Research is proud to release Decoupled DiLoCo, the next frontier for resilient AI pre-training. Decoupled DiLoCo enables training with datacenters across the world, using heterogeneous hardware, and never halting the system despite hardware failures.

34

609

85

299

3M

ebelilov retweeted

H @hcompany_ai

about 2 months ago

When it comes to computer-use, 80 is the new 70. Today, we broke a new barrier on the OS-World benchmark with an 80.4% success rate. Holo3 is officially #1 globally for computer-use agents, and it's not even close. 🏅 👉 See for yourself: https://t.co/jTUnRY3nYr A massive congratulations to the whole team. They set a high standard with chart topping results two weeks ago and continue to raise the bar.

hcompany_ai's tweet photo. When it comes to computer-use, 80 is the new 70.

Today, we broke a new barrier on the OS-World benchmark with an 80.4% success rate. Holo3 is officially #1 globally for computer-use agents, and it's not even close. 🏅

👉 See for yourself: https://t.co/jTUnRY3nYr

A massive congratulations to the whole team. They set a high standard with chart topping results two weeks ago and continue to raise the bar.

12

125

21

44

11K

ebelilov retweeted

Paul Janson @janson002

about 2 months ago

PyLO is accepted to MLSys 2026! 🎉🚀 A PyTorch-native library bringing SOTA learned optimizers to the codebases most of us actually use — with fast CUDA kernels and real speedups on large-scale training. Drop-in ready, no more JAX-only barriers. Library: https://t.co/NTjBF64jD3

1

11

8

4

2K

ebelilov retweeted

GC Newsroom @NewsroomGC

2 months ago

Canada launches national initiative to build large-scale AI supercomputing capacity https://t.co/vtwamndcLO

75

959

122

252

136K

ebelilov retweeted

Nathan Lambert

@natolambert

2 months ago

Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released: - Welcome video - Lecture 1: Overview of RLHF & Post-training - Lecture 2: IFT, Reward Models, Rejection Sampling - Lecture 3: RL Math - Lecture 4: RL Implementation I'm going to add question & answer videos throughout the lecture to go deeper on topics that need it, and potentially cover some topics that are too recent and in flux to go in print. I expect 10-15 videos in total over the next few months. At the same time, development around the code for the book is picking up. It's a great time to build the foundation for post-training methods. YT playlist and course landing page below.

natolambert's tweet photo. Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released:

- Welcome video
- Lecture 1: Overview of RLHF & Post-training
- Lecture 2: IFT, Reward Models, Rejection Sampling
- Lecture 3: RL Math
- Lecture 4: RL Implementation

I'm going to add question & answer videos throughout the lecture to go deeper on topics that need it, and potentially cover some topics that are too recent and in flux to go in print. I expect 10-15 videos in total over the next few months.

At the same time, development around the code for the book is picking up. It's a great time to build the foundation for post-training methods.

YT playlist and course landing page below.

50

2K

235

2K

190K

Eugene Belilovsky

@ebelilov

2 months ago

Can we get some AI legislation about not nerfing models without disclosing it

0

2

0

345

ebelilov retweeted

Abhinav Moudgil

@amoudgl

3 months ago

Introducing Celo2: Towards Learned Optimization Free Lunch We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)! 🧵

amoudgl's tweet photo. Introducing Celo2: Towards Learned Optimization Free Lunch

We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)!

🧵 https://t.co/NuvB4qIzX7

3

103

22

81

9K

ebelilov retweeted

Alexander Hoyle @miserlis_

3 months ago

I wrote a blog post on my experience using AI for slide generation Basic idea: write your lecture notes first, then prompt the LLM to produce corresponding slides in reveal.js (h/t @ChenhaoTan). I'm picky about my slides but was happy with the results! (link in thread below)

miserlis_'s tweet photo. I wrote a blog post on my experience using AI for slide generation Basic idea: write your lecture notes first, then prompt the LLM to produce corresponding slides in reveal.js (h/t @ChenhaoTan). I'm picky about my slides but was happy with the results!

(link in thread below) https://t.co/0RSwQoU7EU

6

246

22

291

24K

Eugene Belilovsky

@ebelilov

3 months ago

Shout out to PULSE https://t.co/VG5eRhSNV6 with some nice graphics in the blog

Fireworks AI

@FireworksAI_HQ

3 months ago

We’re seeing lots of interest in how Cursor delivered Composer 2. One less obvious insight: you don't need to spend billions on a giant cluster to do reinforcement learning. With disaggregated sampling, we ran @Cursor_ai Composer 2 training across 3-4 clusters worldwide, with a unified capacity of Fireworks Virtual Cloud. Check how we optimize cross-region 1TB+ model updates by 98%+ while keeping staleness under a few minutes: https://t.co/0Ziv6ssFNx

5

326

26

220

81K

0

1

0

307

ebelilov retweeted

Ben Recht @beenwrekt

3 months ago

I wrote about the ICML LLM witch hunt and why it's paradigmatic of the absurd bureaucratic scaling of peer review. https://t.co/m69PvGw8Pi

3

65

7

33

33K

ebelilov retweeted

Konstantin Mishchenko

@konstmish

3 months ago

Running experiments and editing code with Claude Code is so enjoyable that it's negatively affecting my sleep, it's like "just one more turn" when playing Civilization.

1

33

2

3

3K

Eugene Belilovsky

@ebelilov

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users