Mayee Chen @MayeeChen - Twitter Profile

Pinned Tweet

4 months ago

Data mixing - determining ratios across your training datasets - matters a lot for model quality. While building Olmo 3, we learned it’s hard to set up a method that finds a strong mix, and hard to maintain that mix as datasets change throughout development. Introducing Olmix👇

MayeeChen's tweet photo. Data mixing - determining ratios across your training datasets - matters a lot for model quality. While building Olmo 3, we learned it’s hard to set up a method that finds a strong mix, and hard to maintain that mix as datasets change throughout development.
Introducing Olmix👇 https://t.co/xqFxujcrsk

13

270

72

177

57K

MayeeChen retweeted

Kyle Lo

@kylelostat

2 days ago

happy to share another quality tech report w/ the wider research community 🫶 great read for ppl who want to see all the details for methods + infra for scaling up pretraining & RL, esp detailed discussion about data which is often kept vague by other labs

kylelostat's tweet photo. happy to share another quality tech report w/ the wider research community 🫶

great read for ppl who want to see all the details for methods + infra for scaling up pretraining & RL, esp detailed discussion about data which is often kept vague by other labs https://t.co/7UYviHsgLb

13

387

24

158

26K

MayeeChen retweeted

Jon Saad-Falcon

@JonSaadFalcon

8 days ago

The dominant story in AI has been the growing cloud: bigger clusters, larger models, more gigawatts. We believe the future is in the opposite direction: on-device inference, smaller models, watts instead of gigawatts. Today we're releasing @OpenJarvisAI v1.0: a personal AI assistant that lives, learns, and works on your device.

49

596

91

566

144K

MayeeChen retweeted

Rosinality @rosinality

17 days ago

https://t.co/u09u2f7mmK Using LoRAs for determining dataset mixture. For a continual training setup, when new datasets are introduced, it is possible to train LoRAs for them and combine them with a LoRA on previous datasets.

rosinality's tweet photo. https://t.co/u09u2f7mmK

Using LoRAs for determining dataset mixture. For a continual training setup, when new datasets are introduced, it is possible to train LoRAs for them and combine them with a LoRA on previous datasets. https://t.co/pVH464BA9V

1

83

5

72

5K

Who to follow

Tri Dao

@tri_dao

Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.

Hadi Salman

@hadisalmanX

Research Scientist @OpenAI Previously: PhD @MIT @MSFTResearch @UberATG @SCSatCMU @AUB_Lebanon

Dan Fu

@realDanFu

VP, Kernels @togethercompute Assistant Professor @ucsd_cse Looking for talented kernel engineers and performance engineers!

MayeeChen retweeted

Michael Hu @michahu8

18 days ago

What is the right data mix, and how do we find it as the data keeps changing? This is a core, unsolved problem in continual learning. To tackle it, we built a data mixing algo that works everywhere — pretraining, midtraining, instruction tuning Introducing: On-Policy Mix 🧵1/6

michahu8's tweet photo. What is the right data mix, and how do we find it as the data keeps changing?

This is a core, unsolved problem in continual learning. To tackle it, we built a data mixing algo that works everywhere — pretraining, midtraining, instruction tuning

Introducing: On-Policy Mix

🧵1/6 https://t.co/LCuNkoewVf

6

310

55

320

46K

MayeeChen retweeted

Siddharth Joshi

@sjoshi804

23 days ago

Five years ago, I left a comfortable software engineering job in Big Tech to start a PhD. Last year, I left the PhD to join Datology. Both decisions confused the people around me, and honestly both decisions were about the same thing: I wanted to do research. Not research as in chasing paper deadlines and applying for fellowships / grants, but research in the truest sense of the word - sitting with unsolved, sometimes previously unheard-of problems, contextualizing them, formulating them, exploring solutions to them. I'd had a taste of research in college, flitting between disciplines, but never found something I felt truly passionate about until I came across deep learning. A field mixing empiricism, mathematics, and real-world impact all seamlessly - it made research the most exciting thing I'd ever done in my life. So in 2022 I started my PhD hoping for the chance to explore uncharted frontiers. Three years and several papers at the standard prestigious ML conferences later, I had technically done research. But I still didn't feel like I'd ever had the freedom, support, and resources to explore new and exciting ideas. This is what brought me to Datology as an intern last summer. A hope to do research in the true sense - explore new ideas, supported by my peers and leaders, unconstrained by resources. And of course, about the data. At the end of the summer, I took a risk and stayed, putting my PhD on hold. Since then, I've been lucky enough to grow into leading multimodal data curation at DatologyAI, and with our team we've tackled every challenge possible: the engineering and optimizing of a VLM training stack we built from scratch; the at-times frustrating but ultimately rewarding deep refining of VLM evals in our work DatBench (link); and of course a lot of exhilarating new research on DATA CURATION. But more than anything, I felt like I finally got to do research!! I'd like to specifically thank @arimorcos and @leavittron who entrusted me with this opportunity, empowered me to do the best work of my life (so far), and mentored me to grow not only as a researcher but also as a leader. And a huge thanks to the @datologyai team that made research feel FUN again. Today, we're releasing 20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone. This is the culmination of the multimodal team at Datology's work over the past year. At fixed architecture, recipe, and compute, varying only the pretraining data, we get +11.7pp at 2B across 20 public VLM benchmarks, beat InternVL3.5-2B by ~10pp at ~17x less training compute (without post-training), and hit near-frontier accuracy at 4B with 3.3x lower response FLOPs than Qwen3-VL-4B. Take risks. Bet on yourself. I’m going to keep doing this. At least until my luck runs out :) a 🧵

sjoshi804's tweet photo. Five years ago, I left a comfortable software engineering job in Big Tech to start a PhD. Last year, I left the PhD to join Datology. Both decisions confused the people around me, and honestly both decisions were about the same thing: I wanted to do research. Not research as in chasing paper deadlines and applying for fellowships / grants, but research in the truest sense of the word - sitting with unsolved, sometimes previously unheard-of problems, contextualizing them, formulating them, exploring solutions to them.

I'd had a taste of research in college, flitting between disciplines, but never found something I felt truly passionate about until I came across deep learning. A field mixing empiricism, mathematics, and real-world impact all seamlessly - it made research the most exciting thing I'd ever done in my life. So in 2022 I started my PhD hoping for the chance to explore uncharted frontiers. Three years and several papers at the standard prestigious ML conferences later, I had technically done research. But I still didn't feel like I'd ever had the freedom, support, and resources to explore new and exciting ideas.

This is what brought me to Datology as an intern last summer. A hope to do research in the true sense - explore new ideas, supported by my peers and leaders, unconstrained by resources. And of course, about the data. At the end of the summer, I took a risk and stayed, putting my PhD on hold.

Since then, I've been lucky enough to grow into leading multimodal data curation at DatologyAI, and with our team we've tackled every challenge possible: the engineering and optimizing of a VLM training stack we built from scratch; the at-times frustrating but ultimately rewarding deep refining of VLM evals in our work DatBench (link); and of course a lot of exhilarating new research on DATA CURATION. But more than anything, I felt like I finally got to do research!!

I'd like to specifically thank @arimorcos and @leavittron who entrusted me with this opportunity, empowered me to do the best work of my life (so far), and mentored me to grow not only as a researcher but also as a leader. And a huge thanks to the @datologyai team that made research feel FUN again.

Today, we're releasing 20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone. This is the culmination of the multimodal team at Datology's work over the past year.

At fixed architecture, recipe, and compute, varying only the pretraining data, we get +11.7pp at 2B across 20 public VLM benchmarks, beat InternVL3.5-2B by ~10pp at ~17x less training compute (without post-training), and hit near-frontier accuracy at 4B with 3.3x lower response FLOPs than Qwen3-VL-4B.

Take risks. Bet on yourself. I’m going to keep doing this. At least until my luck runs out :)

a 🧵

10

336

34

144

791K

MayeeChen retweeted

Erin Woo @erinkwoo

29 days ago

your year can go one of two ways

10

4K

269

376

226K

MayeeChen retweeted

Kelly Buchanan

@ekellbuch

29 days ago

Very excited to release Terminal-Bench 2.1! Coding agents are among the most economically consequential deployments of LLMs to date. As agents improve, benchmark reliability matters more. We audited TB2.0 and found and corrected issues in 28/89 tasks. 30% of the benchmark! But the rankings survived, absolute scores moved up to 12pp!

ekellbuch's tweet photo. Very excited to release Terminal-Bench 2.1!

Coding agents are among the most economically consequential deployments of LLMs to date. As agents improve, benchmark reliability matters more.

We audited TB2.0 and found and corrected issues in 28/89 tasks. 30% of the benchmark!

But the rankings survived, absolute scores moved up to 12pp!

28

761

74

219

85K

MayeeChen retweeted

Gabe Pereyra

@gabepereyra

30 days ago

https://t.co/AWIhrxBD5c

28

373

52

533

682K

MayeeChen retweeted

Flapping Airplanes

@flappyairplanes

29 days ago

(1/5) Great to be at @sequoia to give a sneak peek of one of our research directions! TL;DR one path to data-efficiency may be to “abuse GPUs like they’ve never been abused before”

13

979

73

770

178K

MayeeChen retweeted

Snorkel AI

@SnorkelAI

about 1 month ago

Our thanks to everyone who came out to hear @MayeeChen dive into her paper "Olmix: A Framework for Data Mixing Throughout LM Development." ▶️ Replay ICYMI live: https://t.co/ycdzr0ghhm

SnorkelAI's tweet photo. Our thanks to everyone who came out to hear @MayeeChen dive into her paper "Olmix: A Framework for Data Mixing Throughout LM Development." ▶️ Replay ICYMI live: https://t.co/ycdzr0ghhm https://t.co/0v7Azx82fq

1

30

6

1

2K

MayeeChen retweeted

Gene Li @geneli0

about 1 month ago

Happy to chat about this paper, RL, deep learning, etc. Feel free to reach out! Poster Session 6 on Saturday!

0

6

1

0

660

MayeeChen retweeted

Shizhe He @shizhehe

about 1 month ago

We're presenting our paper at ICLR! 🇧🇷 Stop by if you want to chat about agentic systems, multi-model scaling, or want to grab acai with me! 🫐 🗓️ Sat, Apr 25, 3:15 PM – 5:45 PM 📷 Poster Session 3, Pavilion 3 P3-#903

1

25

4

6

2K

Mayee Chen

@MayeeChen

about 1 month ago

I'm at ICLR presenting Olmix (oral) at the Data-FM workshop this Sunday, April 26 @ 10:30AM! DM me to chat about anything related to data and the model development process / try to find the best açaí + pão de queijo with me 😋

Mayee Chen

@MayeeChen

4 months ago

Data mixing - determining ratios across your training datasets - matters a lot for model quality. While building Olmo 3, we learned it’s hard to set up a method that finds a strong mix, and hard to maintain that mix as datasets change throughout development. Introducing Olmix👇

13

270

72

177

57K

4

64

9

11

8K

MayeeChen retweeted

Neel Guha @NeelGuha

about 2 months ago

I built a leaderboard tracking LLM performance on a suite of academic legal benchmarks. This includes LegalBench, LEXAm, Housing QA, BarExam, and some Hallucination benchmarks. Some fun findings:

NeelGuha's tweet photo. I built a leaderboard tracking LLM performance on a suite of academic legal benchmarks. This includes LegalBench, LEXAm, Housing QA, BarExam, and some Hallucination benchmarks.

Some fun findings: https://t.co/MyIlBoPpia

9

71

10

42

7K

MayeeChen retweeted

Nicholas Roberts

@nick11roberts

about 2 months ago

That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it? As it turns out, we have a brand new scaling law for that! 🧵 [1/n]

11

360

53

304

68K

MayeeChen retweeted

Nicholas Roberts

@nick11roberts

2 months ago

The Chinchilla is dead, long live the ___!

4

191

27

121

49K

MayeeChen retweeted

Nathan Lambert

@natolambert

2 months ago

A few facts, while the dust is settling. Ai2 still is... - releasing open models, folks want to, and it's actually required in the NSF grant - using substantial compute to do so from said grant - funded additionally by FFST (new funding body) on top of NSF, for work in open models Overall I'm confident in Ai2 doing great work this year.

22

632

49

75

60K