Zonglin Yang ✈️ ICML 2026 @Yang_zy223 - Twitter Profile

Pinned Tweet

Zonglin Yang ✈️ ICML 2026

about 2 months ago

Can we actually TRAIN LLMs for scientific discovery — or only prompt them to brainstorm? 🧬✨ 🎉 MOOSE-Star → #ICML2026 Most work on LLMs for hypothesis discovery focuses on inference-time agents or feedback-driven refinement. The core generative process — P(hypothesis | research background), or P(h|b) — has been largely sidestepped: directly training it remains an open problem. We show why: a combinatorial complexity barrier makes naive end-to-end training mathematically intractable. First scalable recipe for training P(h|b), with clean scaling laws on both training data and test-time compute. 📄 Paper: https://t.co/VH36cDIIz0 💻 GitHub: https://t.co/H3ewP99ydj 🤗 HF: https://t.co/BnLb7tcsRc 🧵👇

Yang_zy223's tweet photo. Can we actually TRAIN LLMs for scientific discovery — or only prompt them to brainstorm? 🧬✨

🎉 MOOSE-Star → #ICML2026

Most work on LLMs for hypothesis discovery focuses on inference-time agents or feedback-driven refinement. The core generative process — P(hypothesis | research background), or P(h|b) — has been largely sidestepped: directly training it remains an open problem. We show why: a combinatorial complexity barrier makes naive end-to-end training mathematically intractable.

First scalable recipe for training P(h|b), with clean scaling laws on both training data and test-time compute.

📄 Paper: https://t.co/VH36cDIIz0
💻 GitHub: https://t.co/H3ewP99ydj
🤗 HF: https://t.co/BnLb7tcsRc

🧵👇

3

68

19

63

226K

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

6 days ago

@ZechenZhang5 Nice work, Zechen!

1

0

66

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

13 days ago

@qx_dong @GoogleDeepMind @YiTayML @quocleix For sure! 🙌

0

45

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

19 days ago

Next Monday (June 15), I’ll give a tutorial at AI4X 2026 Singapore, as part of the Agentic AI for Science tutorial. My session: 10:30 AM–12:00 PM, Lecture Theater 29. I’ll present the MOOSE research line, which asks: How can LLMs help scientific discovery — with mathematical foundations, discovery harnesses, benchmarks, and scalable training recipes? If you’re attending AI4X and interested in LLMs for scientific discovery / AI4Science, feel free to drop by!

Yang_zy223's tweet photo. Next Monday (June 15), I’ll give a tutorial at AI4X 2026 Singapore, as part of the Agentic AI for Science tutorial.

My session: 10:30 AM–12:00 PM, Lecture Theater 29.

I’ll present the MOOSE research line, which asks:

How can LLMs help scientific discovery — with mathematical foundations, discovery harnesses, benchmarks, and scalable training recipes?

If you’re attending AI4X and interested in LLMs for scientific discovery / AI4Science, feel free to drop by!

0

15

3

978

Who to follow

Bill Yuchen Lin

@billyuchenlin

RL for coding @xAI @SpaceX Affiliate Assistant Prof @UW. Ex: @allen_ai; Google, Meta FAIR.

Shizhe Diao

@shizhediao

MTS @thinkymachines. Prev. NVIDIA. Finetuning your own LLMs with LMFlow: https://t.co/UTykmQBwFr Views are my own.

Qian Liu

@sivil_taram

coding @xai 🇸🇬, previously tiktok and sea ai lab. opinions are my own

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

20 days ago

@ZechenZhang5 Thanks Zechen!

0

47

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

21 days ago

The first post-training scaling laws for scientific discovery🫎 — performance on held-out, unseen research scales log-linearly with training data. MOOSE-Star (ICML 2026): the first tractable and scalable training recipe for discipline-agnostic hypothesis discovery.

Yang_zy223's tweet photo. The first post-training scaling laws for scientific discovery🫎 — performance on held-out, unseen research scales log-linearly with training data.

MOOSE-Star (ICML 2026): the first tractable and scalable training recipe for discipline-agnostic hypothesis discovery. https://t.co/b0oyykwixb

5

46

8

25

288K

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

21 days ago

Full thread with technical details: https://t.co/IxYk2fhbau

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

Can we actually TRAIN LLMs for scientific discovery — or only prompt them to brainstorm? 🧬✨ 🎉 MOOSE-Star → #ICML2026 Most work on LLMs for hypothesis discovery focuses on inference-time agents or feedback-driven refinement. The core generative process — P(hypothesis | research background), or P(h|b) — has been largely sidestepped: directly training it remains an open problem. We show why: a combinatorial complexity barrier makes naive end-to-end training mathematically intractable. First scalable recipe for training P(h|b), with clean scaling laws on both training data and test-time compute. 📄 Paper: https://t.co/VH36cDIIz0 💻 GitHub: https://t.co/H3ewP99ydj 🤗 HF: https://t.co/BnLb7tcsRc 🧵👇

3

68

19

63

226K

0

4

0

426

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

21 days ago

📄 Paper: https://t.co/y4sa3pxZ6u 💻 GitHub: https://t.co/5Z29Q2RtMX 🤗 Models + 108K-paper dataset: https://t.co/dq7wkIUHKg

1

8

1

557

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

23 days ago

MOOSE-Star (ICML 2026): the first scalable SFT recipe for discipline-agnostic scientific hypothesis discovery.🌟 We establish scaling laws for discovery training — and they confirm: the learned ability generalizes to unseen discoveries at test time.

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

Can we actually TRAIN LLMs for scientific discovery — or only prompt them to brainstorm? 🧬✨ 🎉 MOOSE-Star → #ICML2026 Most work on LLMs for hypothesis discovery focuses on inference-time agents or feedback-driven refinement. The core generative process — P(hypothesis | research background), or P(h|b) — has been largely sidestepped: directly training it remains an open problem. We show why: a combinatorial complexity barrier makes naive end-to-end training mathematically intractable. First scalable recipe for training P(h|b), with clean scaling laws on both training data and test-time compute. 📄 Paper: https://t.co/VH36cDIIz0 💻 GitHub: https://t.co/H3ewP99ydj 🤗 HF: https://t.co/BnLb7tcsRc 🧵👇

3

68

19

63

226K

0

22

1

10

54K

Yang_zy223 retweeted

Zuhao Yang @mwxely464

about 1 month ago

What if the very pretrained prior that lets an RL agent explore tools also destroys the format that made it tool-native? We name this the Tool Prior Paradox — and tame it with PARA-GRPO. 🚀 Introducing ParaVT: parallel video tool use × agentic RL.

mwxely464's tweet photo. What if the very pretrained prior that lets an RL agent explore tools also destroys the format that made it tool-native?

We name this the Tool Prior Paradox — and tame it with PARA-GRPO.

🚀 Introducing ParaVT: parallel video tool use × agentic RL. https://t.co/kvpQiU36n2

7

15

10

5

798

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 1 month ago

Thanks for the interest for MOOSE-Star! This associative ability of LLMs has been analyzed by MOOSE-Chem (https://t.co/TvirgSjcVh), which I quote: "We propose an assumption, grounded in preliminary experiments, that LLMs may already possess numerous knowledge pairs capable of being associated to create novel knowledge—even when scientists have not previously recognized any relationship between them." This associate ability is exactly what MOOSE-Star built upon!

0

2

0

162

Yang_zy223 retweeted

MiroMindAI

@miromind_ai

about 1 month ago

We post-train LLMs for math, for code, for instruction-following. Why not for scientific discovery? 🫎 MOOSE-Star (ICML 2026) : the first scalable SFT recipe for discipline-agnostic scientific hypothesis discovery. https://t.co/1THAJQ9pR5 By @Yang_zy223 & @LidongBing from MiroMind.

10

152

32

143

2M

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 1 month ago

@BioAI_Pharma Thanks for sharing! The models and experiment results are fully available and reproducible! GitHub: https://t.co/5Z29Q2RtMX HuggingFace: https://t.co/dq7wkIUHKg

0

2

0

88

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

@bei_zhang01 Thanks, Bei!

0

39

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

🔬 We post-train LLMs for math, for code, for instruction-following. Why not for scientific discovery? No model has been post-trained specifically for hypothesis generation. MOOSE-Star is a first step, with scaling laws suggesting there's much more to unlock.

MiroMindAI

@miromind_ai

about 2 months ago

🚨 LLM-based scientific hypothesis discovery now has a scalable training recipe. MOOSE-Star, accepted at ICML 2026, enables scalable training for hypothesis generation, with more scalable test-time scaling. By our researchers— https://t.co/17DnFbaefZ

1

123

7

98

160K

0

4

2

0

460

Yang_zy223 retweeted

Ethan Xu @LinjieXu

about 2 months ago

Our latest work has been accepted as a regular paper by ICML 2026. Can't wait to see many old/new friends in Seoul. https://t.co/0mFoD7jnGu

0

1

0

344

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

@DianboLiu Thanks Prof. Liu! 🙏

0

35

Yang_zy223 retweeted

Suorong Yang@ICML2026 @Suorong_Yang

about 2 months ago

Excited to share that Data Agent has been accepted to #icml2026 @icmlconf 🎉 Data Agent asks: Can a model learn which data it needs during training? Highlights: ✅ Modular reward designs ✅ Very lightweight agent ✅ Plug-and-play across vision models and LLMs

Suorong_Yang's tweet photo. Excited to share that Data Agent has been accepted to #icml2026 @icmlconf

🎉 Data Agent asks: Can a model learn which data it needs during training?

Highlights:
✅ Modular reward designs
✅ Very lightweight agent
✅ Plug-and-play across vision models and LLMs https://t.co/QilCaoSU6l

1

12

3

1K

Yang_zy223 retweeted

Hui Chen✈️ICML 2026🇰🇷

@chchenhui

about 2 months ago

To what extent do AI-generated papers contain fabrications? 🚀Excited to introduce FabScore for fine-grained evaluation of fabrications in automated AI research. 🧵 We evaluate 144 AI-written papers from multiple sources, including @SakanaAILabs 's AI Scientist, MLR-Bench, @AnalemmaAI 's FARS and the 2025 #Agents4Science Open Conference. Among 54 real conference submissions, we find that approximately 70% contain at least one fabrication; even among accepted papers, the rate remains as high as 59.3%. 📰 Paper: https://t.co/egFQ33fglo 💻 Code: https://t.co/BPpZm6YY24 1/

chchenhui's tweet photo. To what extent do AI-generated papers contain fabrications?

🚀Excited to introduce FabScore for fine-grained evaluation of fabrications in automated AI research. 🧵

We evaluate 144 AI-written papers from multiple sources, including @SakanaAILabs 's AI Scientist, MLR-Bench, @AnalemmaAI 's FARS and the 2025 #Agents4Science Open Conference.

Among 54 real conference submissions, we find that approximately 70% contain at least one fabrication; even among accepted papers, the rate remains as high as 59.3%.

📰 Paper: https://t.co/egFQ33fglo
💻 Code: https://t.co/BPpZm6YY24

1/

6

111

23

56

21K

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

TL;DR — three contributions: 🔬 Theory — first analysis of why training P(h|b) is intractable (combinatorial complexity). 🛠 Training — first recipe that makes training P(h|b) tractable and scalable, with log-linear scaling laws. ⚡ Inference — continuous test-time scaling, breaking the brute-force complexity wall. 📄 Paper: https://t.co/VH36cDIaJs 💻 GitHub: https://t.co/H3ewP990nL 🤗 HF: https://t.co/BnLb7tbV1E Joint work with @LidongBing at @miromind_ai 🍅 Happy to discuss in the comments — questions and critiques welcome. #ICML2026 #AI4Science #LLM #AI4Research #Discovery

0

7

2

313

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

Can we actually TRAIN LLMs for scientific discovery — or only prompt them to brainstorm? 🧬✨ 🎉 MOOSE-Star → #ICML2026 Most work on LLMs for hypothesis discovery focuses on inference-time agents or feedback-driven refinement. The core generative process — P(hypothesis | research background), or P(h|b) — has been largely sidestepped: directly training it remains an open problem. We show why: a combinatorial complexity barrier makes naive end-to-end training mathematically intractable. First scalable recipe for training P(h|b), with clean scaling laws on both training data and test-time compute. 📄 Paper: https://t.co/VH36cDIIz0 💻 GitHub: https://t.co/H3ewP99ydj 🤗 HF: https://t.co/BnLb7tcsRc 🧵👇

3

68

19

63

226K

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

about 2 months ago

Test-time scales — and brute-force collapses 🏔️ The most striking result. We pit brute-force sampling head-to-head against MOOSE-Star (MS): even with ~9,500 unguided samples per case, can brute-force match what MS produces with just 1–3 guided samples? It hits a hard "complexity wall." Brute-force's win rate against MS collapses as required inspirations k grows: 43% → 19% → 0% for k = 1, 2, 3. By k=3, brute-force never wins a single matchup — even at this massive sample budget. Overall, MS wins 61.5% of head-to-heads, with brute-force at just 23.9%. Decomposition turns an intractable discovery problem into a tractable search problem.

Yang_zy223's tweet photo. Test-time scales — and brute-force collapses 🏔️

The most striking result.

We pit brute-force sampling head-to-head against MOOSE-Star (MS): even with ~9,500 unguided samples per case, can brute-force match what MS produces with just 1–3 guided samples?

It hits a hard "complexity wall." Brute-force's win rate against MS collapses as required inspirations k grows:

43% → 19% → 0% for k = 1, 2, 3.

By k=3, brute-force never wins a single matchup — even at this massive sample budget. Overall, MS wins 61.5% of head-to-heads, with brute-force at just 23.9%.

Decomposition turns an intractable discovery problem into a tractable search problem.

1

5

1

0

426

Zonglin Yang ✈️ ICML 2026

@Yang_zy223

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users