Srini Rajagopal @Srini - Twitter Profile

Srini Rajagopal @srini

5 days ago

Interesting work

clem 🤗

@ClementDelangue

15 days ago

The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore. The problem: every RL step, the trainer typically has to sync fresh weights to the inference engine. for a 7B in bf16 that's ~14GB. for a frontier 1T fp8 checkpoint, that's ~1TB; in bf16 it would be ~2TB. per sync. The insight: between two RL steps, ~99% of bf16 weights are bit-identical. at RL learning rates, the optimizer is whispering and bf16 literally cannot hear most of it. the stored bf16 bits don't change. What they shipped in TRL: only the changed elements get encoded as a sparse safetensors file, dropped into a Hugging Face Bucket, and fetched by vLLM. on Qwen3-0.6B, per-step payload goes from 1.2 GB to 20 to 35 MB. This is exactly what we built Buckets for: S3-like object storage on the Hub, Xet-backed (so even full snapshots only transfer the changed chunks). The cherry on top: we ran a FULL disaggregated training where: - the trainer lived on one box - vLLM ran inside a Hugging Face Space - the Wordle environment ran in another Space - weights flowed through one Hub bucket no shared cluster. no RDMA. no VPN. no NCCL across clouds. just HTTPS and a bucket. one GPU + a Hugging Face account is now enough to do real disaggregated RL. multi-replica inference fleets across regions become a small devops exercise, not a research project. Full write-up: https://t.co/CG115IjT0q Open source RL keeps eating the moat!

ClementDelangue's tweet photo. The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore.

The problem: every RL step, the trainer typically has to sync fresh weights to the inference engine. for a 7B in bf16 that's ~14GB. for a frontier 1T fp8 checkpoint, that's ~1TB; in bf16 it would be ~2TB. per sync.

The insight: between two RL steps, ~99% of bf16 weights are bit-identical. at RL learning rates, the optimizer is whispering and bf16 literally cannot hear most of it. the stored bf16 bits don't change.

What they shipped in TRL: only the changed elements get encoded as a sparse safetensors file, dropped into a Hugging Face Bucket, and fetched by vLLM. on Qwen3-0.6B, per-step payload goes from 1.2 GB to 20 to 35 MB. This is exactly what we built Buckets for: S3-like object storage on the Hub, Xet-backed (so even full snapshots only transfer the changed chunks).

The cherry on top: we ran a FULL disaggregated training where:
- the trainer lived on one box
- vLLM ran inside a Hugging Face Space
- the Wordle environment ran in another Space
- weights flowed through one Hub bucket

no shared cluster. no RDMA. no VPN. no NCCL across clouds. just HTTPS and a bucket.

one GPU + a Hugging Face account is now enough to do real disaggregated RL. multi-replica inference fleets across regions become a small devops exercise, not a research project.

Full write-up: https://t.co/CG115IjT0q

Open source RL keeps eating the moat!

28

597

70

337

62K

0

2

0

104

Srini Rajagopal @srini

7 days ago

@citrini Wait till it comes to Microsoft and Amazon raising capital, everyone needs to rush out before the big ipos

0

1

0

368

Srini Rajagopal @srini

7 days ago

I would be surprised if othe hyperscalers AWS, Microsoft don’t do an equity raise similar to Google before the OpenAI and Anthropic fund raises to have enough dry powder

1

3

0

1K

Srini Rajagopal @srini

8 days ago

@HannaHajishirzi Congratulations

0

22

Who to follow

anthony

@nieto

Dad, Husband, Video Editor. Host/Editor of The Trader’s Journey podcast. Member of @IUTraders. Previously: Twitter  YouTube 📺 California 🐻

Krishna Gade

@krishnagade

Founder & CEO at @fiddler_ai, Building Trust into AI. Prior: @facebook, @pinterest, @twitter, @microsoft

Alan

@alan

Raised in the 6, started from the bottom. Software Engineer.

srini retweeted

Dwarkesh Patel

@dwarkesh_sp

9 days ago

Recently met @srush_nlp and he started giving me an impromptu lecture on how targeted on-policy self-distillation works. I asked him if I could record it on my iPhone. The basic idea is this: if the model made a mistake at some point in the rollout (for example, calling a tool that doesn't exist), we want to discourage this specific error, but we don't want to just learn from the final reward, because it's a very noisy signal spread out over the whole trajectory. So we have another model read this trajectory and figure where the error was made. It simply inserts some hint tokens to the part of the trajectory right above where the mistake was made. Now with these injected hint tokens, have the model run a forward pass. You're not having to regenerate a new rollout - aka no new decode required. The hint causes the model to assign lower probabilities to the error tokens. You then trains the original model to match these new probabilities, teaching it to downweight that specific mistake.

40

3K

173

3K

408K

Srini Rajagopal @srini

10 days ago

@TheRohanVarma Congrats

0

1

0

56

Srini Rajagopal @srini

11 days ago

@nikunj There is also TPU, Google cloud

0

53

Srini Rajagopal @srini

14 days ago

@darshil Where is it happening

0

32

Srini Rajagopal @srini

14 days ago

@jukan05 @jukan05 how often do you change it? When will update it next?

0

2K

Srini Rajagopal @srini

14 days ago

@inventur_es @joinshiftX Haha

0

259

Srini Rajagopal @srini

18 days ago

@mukund Interesting, time for you to move to to Bay Area :)

1

0

77

Srini Rajagopal @srini

21 days ago

@dsp_ Good stuff, stateless finally

0

70

Srini Rajagopal @srini

24 days ago

@bubbleboi @bubbleboi thoughts on TPU versus Trainium? Is there any writeup on this? thanks!

0

66

Srini Rajagopal @srini

26 days ago

@BrendanFoody Don’t fight the tide

0

1

0

217

Srini Rajagopal @srini

27 days ago

@KobeissiLetter @WisemanCap does this change your plan

1

0

74

Srini Rajagopal @srini

about 1 month ago

Excited for what the team is building!

Jagannath Putrevu

@jputrevu

about 1 month ago

🚀 After a year of quiet building, I’m excited to officially announce @champ_hq out of stealth. We're also announcing our $8.5M Seed round led by @Redpoint with participation from @defyvc , @Max , @svangel, and a great group of angels. Watch the quick launch video below