Saurabh Dash @TheyCallMeMr_ - Twitter Profile

Saurabh Dash

@TheyCallMeMr_

2 days ago

@ChenHenryWu We did something similar too :) https://t.co/IBWqkhATho

0

14

1

12

527

TheyCallMeMr_ retweeted

Cohere Labs

@Cohere_Labs

10 days ago

Announcing new research on Self-Verified RL using Soft-Rewards 📋, led by @TheyCallMeMr_, Pierre Clavier, @johnamqdang, @mgalle, @mziizm, @ahmetustun89, and @beyzaermis. 📜Read the paper: https://t.co/niWrDFmhOu

Cohere_Labs's tweet photo. Announcing new research on Self-Verified RL using Soft-Rewards 📋, led by @TheyCallMeMr_, Pierre Clavier, @johnamqdang, @mgalle, @mziizm, @ahmetustun89, and @beyzaermis.

📜Read the paper: https://t.co/niWrDFmhOu https://t.co/nHoYUdsQjq

0

18

6

10

6K

TheyCallMeMr_ retweeted

Marzieh Fadaee @mziizm

9 days ago

Really proud to see this paper coming together. Congrats to the authors, specially @TheyCallMeMr_ and @beyzaermis leading this great work.

0

6

1

0

177

TheyCallMeMr_ retweeted

Marzieh Fadaee @mziizm

9 days ago

RL for language models works best when answers can be verified exactly, like math or code. We argue that most real-world tasks are only partially verifiable and shows that turning prompts into checklists gives models a much richer learning signal than a single pass/fail judgment

1

43

6

25

5K

Who to follow

Arash Ahmadian

@aahmadian_

Research Scientist @GoogleDeepmind, Gemini RL & post-training, Gemini 3. prev: @Cohere @CohereForAI

TheyCallMeMr_ retweeted

Hieu Pham

@hyhieu226

10 days ago

https://t.co/y0tc2tyjgE 😂

6

83

6

16

47K

Saurabh Dash

@TheyCallMeMr_

10 days ago

@bilaltwovec Time for my bio to shine

0

1

0

82

Saurabh Dash

@TheyCallMeMr_

10 days ago

7/ Read the full paper here: https://t.co/IBWqkhATho Work with Pierre Clavier, @johnamqdang, @mgalle, @mziizm, @ahmetustun89, and @beyzaermis.

1

6

1

0

203

Saurabh Dash

@TheyCallMeMr_

10 days ago

1/ RLVR has driven big gains in math and code because many outputs admit reliable automatic checks: an answer matches the expected result, or a program passes tests. But many real tasks are not like that. Code can be functionally correct but qualitatively terrible or a response may satisfy 4 syntactic constraints but fail 1 semantic constraint.

1

17

7

4

3K

Saurabh Dash

@TheyCallMeMr_

10 days ago

6/ We also study self-verification: using the same model as generator and verifier. Naive self-verification collapses: measured reward rises, but IFEval drops from 73.9 → 55.1 as the verifier learns to always say “yes.” Soft-SVeRL stabilizes this with verifier co-training, aggregating parallel verifier calls, and an anti-collapse penalty.

TheyCallMeMr_'s tweet photo. 6/ We also study self-verification: using the same model as generator and verifier.

Naive self-verification collapses: measured reward rises, but IFEval drops from 73.9 → 55.1 as the verifier learns to always say “yes.”

Soft-SVeRL stabilizes this with verifier co-training, aggregating parallel verifier calls, and an anti-collapse penalty.

1

5

0

120

TheyCallMeMr_ retweeted

Dhruv Kuchhal @kuchhal_dhruv

11 days ago

Excited for this initiative! https://t.co/0hJ8BR1Zg6

0

3

2

0

643

Saurabh Dash

@TheyCallMeMr_

11 days ago

@willccbb > my instinct is you basically need to model the world I am increasingly subscribing to this camp with the caveat of the need to co-train the generators and verifiers / world-models.

0

107

TheyCallMeMr_ retweeted

Nick Frosst

@nickfrosst

11 days ago

@leftylabourtech Man it’s my company?

22

698

22

21

80K

TheyCallMeMr_ retweeted

Adib

@adibvafa

11 days ago

how is this app free

4

476

7

35

50K

Saurabh Dash

@TheyCallMeMr_

24 days ago

Wordcel in the streets Shape-rotator in the sheets

Goodfire

@GoodfireAI

24 days ago

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

122

4K

556

3K

934K

0

2

0

104

TheyCallMeMr_ retweeted

Cohere

@cohere

about 2 months ago

Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4-bit weights (low memory) with 8-bit activations (high compute), we hit the sweet spot for both decoding and prefill — up to 58% faster TTFT and 45% faster TPOT vs W4A16 on Hopper.

cohere's tweet photo. Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4-bit weights (low memory) with 8-bit activations (high compute), we hit the sweet spot for both decoding and prefill — up to 58% faster TTFT and 45% faster TPOT vs W4A16 on Hopper. https://t.co/M37wT5KS8Z

5

262

37

80

23K

Saurabh Dash

@TheyCallMeMr_

about 2 months ago

@anmol01gulati All the best, Anmol!

0

1

0

112

Saurabh Dash

@TheyCallMeMr_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users