Simran Kaur @kaur_simran25 - Twitter Profile

6 months ago

If you’re at NeurIPS, come check out our poster at the Efficient Reasoning (Spotlight) and MATH-AI workshops! 👇

6 months ago

How does RL improve OOD reasoning? How can we distinguish compositional generalization from length generalization? What makes a composition more learnable? Check out our #neurips2025 workshop poster tomorrow! 🗓️Sat, 12/6, 8am-5pm Efficient Reasoning 📍Exhibit Hall F (Spotlight) MATH-AI 📍Upper Level Ballroom 6A 🔗https://t.co/anfP9VPrZw Joint work with @kaur_simran25 @prfsanjeevarora

0

158

24

148

13K

0

4

0

1

903

Simran Kaur @kaur_simran25

6 months ago

I’m at NeurIPS 12/4-7! Excited to see old friends + meet new ones — DM if you’d like to grab coffee☕️ These days, I'm excited about synthetic data, distillation, and anything post-training! I’m also looking for a Summer 2026 internship, so reach out if you think I’d be a good fit

0

9

0

777

kaur_simran25 retweeted

Abhishek Panigrahi @Abhishek_034

about 1 year ago

🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster: Thurs, 3:00–5:30 pm (#584) Happy to chat about distillation, curricula, and efficient pretraining!

Abhishek_034's tweet photo. 🎉Excited to present 2 papers at #ICLR2025 in Singapore!

🧠 Progressive distillation induces an implicit curriculum
📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218
🖼️ Poster: Sat, 10:00am–12:30pm (#632)

⚙️ Efficient stagewise pretraining via progressive subnetworks
🖼️ Poster: Thurs, 3:00–5:30 pm (#584)

Happy to chat about distillation, curricula, and efficient pretraining!

2

61

10

8

10K

kaur_simran25 retweeted

Xingyu Zhu @XingyuZhu_

over 1 year ago

Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens? We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”! A thread 🧵1/N

XingyuZhu_'s tweet photo. Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens?

We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”!

A thread 🧵1/N

7

187

21

150

40K

Who to follow

Mingjie Sun

@_mingjiesun

Member of Technical Staff @thinkymachines | prev CS PhD @CSDatCMU

Yiding Jiang

@yidingjiang

Research @GoogleDeepMind | Prev: PhD @mldcmu, AI resident @GoogleAI, BS @Berkeley_EECS. Trying to understand stuff.

Sadhika Malladi

@SadhikaMalladi

Postdoc researcher at MSR NYC; incoming faculty at UCSD CSE; CS PhD at Princeton

kaur_simran25 retweeted

Sanjeev Arora

@prfsanjeevarora

almost 2 years ago

1/ New instruction-following dataset INSTRUCT-SKILLMIX! Supervised fine-tuning (SFT) with just 2K-4K (query, answer) pairs gives small “base LLMs” Mistral v0.2 7B and LLaMA3 8B performance rivalling some frontier models (AlpacaEval 2.0 score). No RL, no expensive human data. “Secret sauce”? Leveraging LLM metacognition!

prfsanjeevarora's tweet photo. 1/ New instruction-following dataset INSTRUCT-SKILLMIX! Supervised fine-tuning (SFT) with just 2K-4K (query, answer) pairs gives small “base LLMs” Mistral v0.2 7B and LLaMA3 8B performance rivalling some frontier models (AlpacaEval 2.0 score). No RL, no expensive human data. “Secret sauce”? Leveraging LLM metacognition!

4

151

19

115

23K

Simran Kaur @kaur_simran25

almost 2 years ago

tldr; when done well, synthetic data can be quite effective! Joint work with my amazing coauthors @parksimon0808 @anirudhg9119 @prfsanjeevarora

0

1

0

167

Simran Kaur @kaur_simran25

almost 2 years ago

Excited to share Instruct-SkillMix, a pipeline for generating high quality, diverse synthetic SFT data. SFT on just 4K examples can boost LLaMA-3-8B-Base over LLaMA-3-8B-Instruct, yielding 42.76% LC win rate on AlpacaEval. Paper: https://t.co/Yrd2JR2EQg

kaur_simran25's tweet photo. Excited to share Instruct-SkillMix, a pipeline for generating high quality, diverse synthetic SFT data.

SFT on just 4K examples can boost LLaMA-3-8B-Base over LLaMA-3-8B-Instruct, yielding 42.76% LC win rate on AlpacaEval.

Paper: https://t.co/Yrd2JR2EQg https://t.co/72SCie7omF

2

15

2

1K

Simran Kaur @kaur_simran25

almost 2 years ago

Additionally, we perform a preliminary exploration of difficulties in naive instruction-tuning. Replacing 20% of SFT data with “poor quality” data (i.e., deliberately sloppy and unhelpful) leads to super-proportional harm to the models. [7/n]

kaur_simran25's tweet photo. Additionally, we perform a preliminary exploration of difficulties in naive instruction-tuning. Replacing 20% of SFT data with “poor quality” data (i.e., deliberately sloppy and unhelpful) leads to super-proportional harm to the models.

[7/n] https://t.co/oWzuj0ktBf

1

0

200

kaur_simran25 retweeted

Sadhika Malladi

@SadhikaMalladi

over 2 years ago

Blog post about how to scale training runs to highly distributed settings (i.e., large batch sizes)! Empirical insights from my long-ago work on stochastic differential equations (SDEs). Written to be accessible - give it a shot! https://t.co/KwLtlrHK0t

7

377

72

400

82K

Simran Kaur @kaur_simran25

over 2 years ago

Excited to share our latest work: Skill-Mix, a new take on LLM evaluation that tests a model's ability to combine basic language skills! Check out the Skill-Mix demo here: https://t.co/k0evUWZgZh

Dingli Yu @dingli_yu

over 2 years ago

Does high rank on LLM leaderboards mean anything? Or is it just a game of "dataset contamination" and "Stochastic Parrots?" Find answers via Skill-Mix, our evaluation of LLMs’ capacity to combine skills! Paper: https://t.co/hy5iZWCFcY

3

66

11

23

22K

0

15

1

0

2K

kaur_simran25 retweeted

Zachary Novack @zacknovack

over 3 years ago

Our work on understanding the mechanisms behind implicit regularization in SGD was just accepted to #ICLR2023 ‼️ Huge thanks to my collaborators @kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton 🙂 Check out the thread below for more info:

2

44

6

7

13K

kaur_simran25 retweeted

Zachary Novack @zacknovack

over 3 years ago

1/n ‼️ Our spotlight (and now BEST POSTER!) work from the Higher Order Optimization workshop at #NeurIPS2022 is now on arxiv! Paper 📖: https://t.co/TTKmW75PIR w/@kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton

1

22

7

9

0

kaur_simran25 retweeted

Zachary Novack @zacknovack

over 3 years ago

Excited to announce that my first published paper (!!) will be a spotlight at the #NeurIPS2022 Higher-Order Optimization workshop on Dec 2nd! Huge thanks to my co-authors @kaur_simran25 @__tm__157 @saurabh_garg67 @zacharylipton, paper thread coming soon! https://t.co/fkPKOtYOqg

0

19

5

4

0

Simran Kaur @kaur_simran25

almost 4 years ago

5/ We hope to inspire future efforts aimed at understanding the relationship between the max Hessian eigenvalue and generalization, and to spark conversation regarding whether this quantity should be treated as a generalization metric at all.

0

4

0

Simran Kaur @kaur_simran25

almost 4 years ago

Is flatness indicative of generalization? Not necessarily. Our experimental study calls the relationship between flatness (as measured by the max Hessian eigenvalue) and generalization into question. https://t.co/ORln4ASVEq

11

249

36

87

0

Simran Kaur @kaur_simran25

almost 4 years ago

4/ While methods motivated by flatness produce useful tools, the max Hessian eigenvalue does not provide a scientific explanation for improvements in generalization. Thus, it is evident that there is a deeper story behind why flatness seems to be fruitful intuition.

2

4

0

Simran Kaur

@kaur_simran25

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users