emily mcmilin @micmylin - Twitter Profile

micmylin retweeted

7 days ago

Very interesting study from Opus 4.8 card: Multi-agents do not deliver better results on ProgramBench, but they get to mediocre solutions 2x faster.

KLieret's tweet photo. Very interesting study from Opus 4.8 card: Multi-agents do not deliver better results on ProgramBench, but they get to mediocre solutions 2x faster. https://t.co/2JiaAtxORC

5

112

12

30

13K

micmylin retweeted

John Yang

@jyangballin

30 days ago

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

jyangballin's tweet photo. How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access.

Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵 https://t.co/8ayeDJLXaJ

103

2K

246

656

723K

micmylin retweeted

Yuxiang Wei

@YuxiangWei9

about 1 month ago

Accepted to ICML 2026! Big thanks to all the collaborators 🎉

1

55

5

13

4K

emily mcmilin @micmylin

about 1 month ago

@jaraxiong Thanks so much. That’s great to hear :)

0

1

0

25

Who to follow

Eric

@ericmitchellai

chatgpt posttraining @openai. building personal agi. I like ai and music and some other stuff

Alex Dimakis

@AlexGDimakis

Professor, UC berkeley | Founder @bespokelabsai |

Robert Nishihara

@robertnishihara

Co-founder @anyscalecompute. Co-creator of @raydistributed. Previously PhD ML at Berkeley.

emily mcmilin @micmylin

about 1 month ago

I'll be giving a talk at the ICLR VerifAI workshop, about code execution for code world modeling, later today (Sun) at 9:05 am (Brazil time). Swing by if you are interested in learning more!

Ameesh Shah @ameeshsh

5 months ago

🗣️📣Announcing VerifAI 2: AI Verification in the Wild, an upcoming workshop at #ICLR2026!! 🗣️📣 VerifAI will gather researchers to explore topics at the intersection of genAI and trustworthy ML. Submit your work! Check out our website and CFP for more: https://t.co/VFWNqp7zCK

ameeshsh's tweet photo. 🗣️📣Announcing VerifAI 2: AI Verification in the Wild, an upcoming workshop at #ICLR2026!! 🗣️📣

VerifAI will gather researchers to explore topics at the intersection of genAI and trustworthy ML. Submit your work!

Check out our website and CFP for more: https://t.co/VFWNqp7zCK https://t.co/RhhL0Uw9V7

0

26

7

5

10K

1

18

2

6

3K

micmylin retweeted

Zhiqing Sun

@EdwardSun0909

about 2 months ago

Excited to share Muse Spark, the first model from whole team’s work in MSL! 🚀 It’s natively multimodal and agentic. I’ve been using it for my daily coding and research tasks. Still plenty of room to improve in agentic domains, but we’re moving with great velocity. It’s a seriously good model! Check out the full breakdown and try it out in https://t.co/Fka0wdAswy

8

203

26

9

20K

micmylin retweeted

Yuxiang Wei

@YuxiangWei9

5 months ago

Software agents can self-improve via self-play RL Introducing Self-play SWE-RL (SSR): training a single LLM agent to self-play between bug-injection and bug-repair, grounded in real-world repositories, no human-labeled issues or tests. 🧵

YuxiangWei9's tweet photo. Software agents can self-improve via self-play RL

Introducing Self-play SWE-RL (SSR): training a single LLM agent to self-play between bug-injection and bug-repair, grounded in real-world repositories, no human-labeled issues or tests. 🧵

64

2K

287

1K

525K

emily mcmilin @micmylin

6 months ago

@rosemary_ke I'm a fan of your work. It'd be great to meet :)

1

0

161

emily mcmilin @micmylin

6 months ago

We modify each repo's CI workflows to capture a single successful third-party build. For pytest repos, we inject https://t.co/JagbOd5Jzg fixtures to verify the correct container and support optional Python execution tracing. See more in our paper: https://t.co/pVOTNTjviM

1

0

1

136

emily mcmilin @micmylin

6 months ago

Better late than never to share how we built 35k+ unique repos (rather than commits from the same dozens of repos) into executable envs for CWM mid-training and SWE-RL post-training... https://t.co/ZeG4MXvHaj

Gabriel Synnaeve @syhw

8 months ago

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. https://t.co/BJSUCh2vtg

60

2K

311

1K

920K

1

11

2

1

2K

emily mcmilin @micmylin

6 months ago

Key insight: the execution env of a GitHub Actions CI workflow is fully built with deps. So we can cheaply capture it as a standalone Docker image for later execution.

1

0

154

micmylin retweeted

Taco Cohen

@TacoCohen

9 months ago

The eagle-eyed goat in question being @YuxiangWei9

0

22

1

2

5K

emily mcmilin @micmylin

over 1 year ago

Link to video where our part of the convo starts: https://t.co/7OttKRwfmL Botched last attempt to send this. But better late than never...

0

2

0

342

emily mcmilin @micmylin

over 1 year ago

Thank you @AleksanderMolak for the really nice opportunity to discuss some of my prior research with you, earlier this year! https://t.co/x0ZAR91FI1

1

4

0

776

emily mcmilin @micmylin

over 1 year ago

Dreams can come true. I’ve joined FAIR’s CodeGen team. :)

14

360

1

29

35K

emily mcmilin @micmylin

about 2 years ago

@vishnuvig Thanks so much for the lightning fast, GPU speed and technical support, over the years. Great service!

1

2

0

1K

micmylin retweeted

Udacity

@udacity

about 2 years ago

💡 Interested in learning more about LLM fundamentals? In the video below, Udacity instructor Emily McMilin explains what the Transformer model is & walks you through the difference between Encoder and Decoder model architectures. https://t.co/ZHUvKEkr8N #genAI #generativeAI

udacity's tweet photo. 💡 Interested in learning more about LLM fundamentals?

In the video below, Udacity instructor Emily McMilin explains what the Transformer model is & walks you through the difference between Encoder and Decoder model architectures.

https://t.co/ZHUvKEkr8N

#genAI #generativeAI https://t.co/lkTxaYisFn

0

10

1

2

6K

emily mcmilin @micmylin

about 2 years ago

Our research showing how task underspecification can cause spurious correlations & hallucinations, from BERT to GPT-3.5 is now available as AAAI 24 proceedings: https://t.co/q0SN2Rf31H Video: https://t.co/nmSTV2RWss Arxiv extended to GPT-4 Turbo Preview: https://t.co/5XLZ9P0oav

0

5

0

3

1K

emily mcmilin @micmylin

about 2 years ago

@srush_nlp … oops, GPT-4 turbo (preview) results only made it to the arxiv version. https://t.co/5XLZ9P0oav

0

97

emily mcmilin @micmylin

about 2 years ago

@srush_nlp Using pronoun resolution as a case study, we hypothesize a casual mechanism & show empirically, that denoising objs are generally less underspecified, less vulnerable to spurious correlations / hallucinations, w AR comps ranging up to GPT-4 turbo preview. https://t.co/q0SN2Rf31H

1

0

381

emily mcmilin

@micmylin

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users