Jiarui Yao @explainmiracles - Twitter Profile

7 days ago

✨ Creativity is not just recognizing what an object is — it is imagining what it could become. 🔧 A key edge can cut tape. 🛡️ A rubber pad can protect a wall. 🪮 A comb guard can clear a sink slot. But can multimodal AI agents discover these hidden physical affordances from images? 🚀We introduce MM-CreativityBench, a benchmark designed to test whether LMMs can creatively repurpose everyday objects by interactively inspecting scenes, entities, and object parts. 🔍 Our findings show that today’s LMMs often identify the right object, but fail to ground their reasoning in the right part. They hallucinate properties, overlook physical constraints, or propose solutions that are not mechanically valid. 🧠 To move beyond plausible guesses, we propose affordance-grounded alignment: training models to explore visual evidence, reject hallucinated affordances, and reason from geometry, material, and mechanics. 📄 Paper: https://t.co/DW6J06yPHK 🌐 Project: https://t.co/KMDTLKaa0r 💻 Code: https://t.co/4L3LYObPZX 🤗 Hugging Face: https://t.co/XPmrfP0Gie

qiancheng1231's tweet photo. ✨ Creativity is not just recognizing what an object is — it is imagining what it could become.

🔧 A key edge can cut tape.
🛡️ A rubber pad can protect a wall.
🪮 A comb guard can clear a sink slot.

But can multimodal AI agents discover these hidden physical affordances from images?

🚀We introduce MM-CreativityBench, a benchmark designed to test whether LMMs can creatively repurpose everyday objects by interactively inspecting scenes, entities, and object parts.

🔍 Our findings show that today’s LMMs often identify the right object, but fail to ground their reasoning in the right part. They hallucinate properties, overlook physical constraints, or propose solutions that are not mechanically valid.

🧠 To move beyond plausible guesses, we propose affordance-grounded alignment: training models to explore visual evidence, reject hallucinated affordances, and reason from geometry, material, and mechanics.

📄 Paper: https://t.co/DW6J06yPHK
🌐 Project: https://t.co/KMDTLKaa0r
💻 Code: https://t.co/4L3LYObPZX
🤗 Hugging Face: https://t.co/XPmrfP0Gie

3

49

22

21

6K

ExplainMiracles retweeted

Pengcheng Wang

@PengchengWang19

about 1 month ago

🚀 Wanna build your own customized agent with controllable workflow? Introducing AgentSPEX — a declarative DSL for building LLM agents. - Customizable agentic workflow with GUI builder - Reproducible State-of-the-Art in SWE-bench verified - Controllable workflow with YAML specification + sandboxed VM + Lean4 verification 🌐 Demo: https://t.co/hZ7O8R4yKe 💻 Code: https://t.co/PniYZnOUqq 📄 Paper: https://t.co/qMGBhVG9Dj 💡 (One of the) Applications: https://t.co/KjmqqDJK6I (1/n) #LLM #AIAgents #AgenticAI #OpenSource #AIResearch #AgentHarness

PengchengWang19's tweet photo. 🚀 Wanna build your own customized agent with controllable workflow? Introducing AgentSPEX — a declarative DSL for building LLM agents.

- Customizable agentic workflow with GUI builder
- Reproducible State-of-the-Art in SWE-bench verified
- Controllable workflow with YAML specification + sandboxed VM + Lean4 verification

🌐 Demo: https://t.co/hZ7O8R4yKe
💻 Code: https://t.co/PniYZnOUqq
📄 Paper: https://t.co/qMGBhVG9Dj
💡 (One of the) Applications: https://t.co/KjmqqDJK6I
(1/n)

#LLM #AIAgents #AgenticAI #OpenSource #AIResearch #AgentHarness

1

18

9

12

13K

ExplainMiracles retweeted

Microsoft Research

@MSFTResearch

3 months ago

PlugMem transforms AI agents’ interaction histories into structured, reusable knowledge. It integrates with any agent, supports diverse tasks and memory types, and maximizes decision quality while significantly reducing memory token use: https://t.co/girJeCrr6p

MSFTResearch's tweet photo. PlugMem transforms AI agents’ interaction histories into structured, reusable knowledge. It integrates with any agent, supports diverse tasks and memory types, and maximizes decision quality while significantly reducing memory token use: https://t.co/girJeCrr6p https://t.co/hJ6yUocmps

2

38

34

17

9K

ExplainMiracles retweeted

Ke Yang @EmpathYang

3 months ago

📰New preprint: How can we build a task-agnostic plug-and-play memory module for LLM agents that supports multiple memory types? We present PlugMem🔌🧠, a plugin memory module that works across tasks by turning heterogeneous experience into knowledge. Evaluated unchanged on long-term dialogue🗣️, multi-hop QA🕵️, and web agents🕸️🤖, PlugMem improves performance while using far fewer memory tokens. 📜Paper: https://t.co/A8tNQjkCCb 🔨Code: https://t.co/mt1aJKxQIz

EmpathYang's tweet photo. 📰New preprint: How can we build a task-agnostic plug-and-play memory module for LLM agents that supports multiple memory types?
We present PlugMem🔌🧠, a plugin memory module that works across tasks by turning heterogeneous experience into knowledge.
Evaluated unchanged on long-term dialogue🗣️, multi-hop QA🕵️, and web agents🕸️🤖, PlugMem improves performance while using far fewer memory tokens.
📜Paper: https://t.co/A8tNQjkCCb
🔨Code: https://t.co/mt1aJKxQIz

13

168

64

100

12K

ExplainMiracles retweeted

Cheng Qian

@qiancheng1231

5 months ago

🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts. Check our newest paper here: https://t.co/nujSGeHKMx #AIagents #WorldModel #ToolUse

qiancheng1231's tweet photo. 🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts.

Check our newest paper here: https://t.co/nujSGeHKMx

#AIagents #WorldModel #ToolUse https://t.co/cHbfRg2pzb

1

52

19

15

12K

Jiarui Yao @ExplainMiracles

7 months ago

Thrilled to share our paper MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning (https://t.co/FT7zzXvF3c) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team @evangelinejy99 @RuiYang70669025 @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml!

ExplainMiracles's tweet photo. Thrilled to share our paper MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning (https://t.co/FT7zzXvF3c) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉
Huge congrats to the team @evangelinejy99 @RuiYang70669025 @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml!

0

8

0

320

ExplainMiracles retweeted

Rui Yang @RuiYang70669025

7 months ago

Thrilled to share our paper (https://t.co/TkqiAyTKXg) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team @evangelinejy99 @ExplainMiracles @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml!

RuiYang70669025's tweet photo. Thrilled to share our paper (https://t.co/TkqiAyTKXg) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉
Huge congrats to the team @evangelinejy99 @ExplainMiracles @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml! https://t.co/4gXJZCAShR

0

25

3

0

2K

Jiarui Yao @ExplainMiracles

7 months ago

I am at EMNLP 2025 HPC-AI! #emnlp2025 #hpcai

0

7

0

257

Jiarui Yao @ExplainMiracles

9 months ago

Glad that our paper has been accepted to Neurips 2025! By gradient variance minimization (GVM), we balance the training data by difficulties and their contribution to the model. We achieve improvement on math reasoning. Please check the original post for more details.

Jiarui Yao @ExplainMiracles

about 1 year ago

We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs. – Achieves 2–4× faster convergence than RAFT – Improves accuracy on math reasoning benchmarks – Generalizes to reinforcement learning methods such as GRPO – Comes with theoretical convergence guarantees 📄 Paper: https://t.co/3CuExbJANR 🔗 Code: (expected in a few hours) https://t.co/vFO3mWIM8A #LLM #MachineLearning #ReinforcementLearning #ChainOfThought #AIResearch

ExplainMiracles's tweet photo. We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs.

– Achieves 2–4× faster convergence than RAFT
– Improves accuracy on math reasoning benchmarks
– Generalizes to reinforcement learning methods such as GRPO
– Comes with theoretical convergence guarantees

📄 Paper: https://t.co/3CuExbJANR
🔗 Code: (expected in a few hours) https://t.co/vFO3mWIM8A

#LLM #MachineLearning #ReinforcementLearning #ChainOfThought #AIResearch

0

88

27

45

6K

0

2

0

232

ExplainMiracles retweeted

Peixuan Han @peixuanhakhan

9 months ago

(1/5) Super excited to release our new paper on Reinforcement Learning: "Self-Aligned Reward: Towards Effective and Efficient Reasoners"! Preprint: https://t.co/Bwy1y2UPl6

peixuanhakhan's tweet photo. (1/5) Super excited to release our new paper on Reinforcement Learning:

"Self-Aligned Reward: Towards Effective and Efficient Reasoners"!

Preprint: https://t.co/Bwy1y2UPl6 https://t.co/HOEwD4dSvc

2

33

15

7K

ExplainMiracles retweeted

Cheng Qian

@qiancheng1231

10 months ago

🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 https://t.co/1dnzD3il0t 💻 https://t.co/hZuI2ZQTKE

qiancheng1231's tweet photo. 🤝 Can LLM agents really understand us?

We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands.

📄 https://t.co/1dnzD3il0t
💻 https://t.co/hZuI2ZQTKE https://t.co/kSvYZ1TxeO

6

118

36

59

14K

ExplainMiracles retweeted

Yong Lin @Yong18850571

11 months ago

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B model matches DeepSeek-671B on MiniF2F. 📚 Leading on MathOlympiadBench (IMO-level problems) * Solves 73 vs 50 over 671B DeepSeek Prover 🔓 Website: https://t.co/lLQpltH0ea 🔓 Model 32B: https://t.co/HcIGZPOB9L 🔓 Model 8B https://t.co/yE3Fm3i4Ev 🔓Data and training pipeline will be released soon. Amazing Collaborators: @sangertang1999 @Lyubh22 @__zrrr__ @juihuichung @thomaszhao1998 @pero733858111 @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML

Yong18850571's tweet photo. (1/4)🚨 Introducing Goedel-Prover V2 🚨
🔥🔥🔥 The strongest open-source theorem prover to date.
🥇 #1 on PutnamBench: Solves 64 problems—with far less compute.
🧠 New SOTA on MiniF2F:
* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%.
* 8B > 671B: Our 8B model matches DeepSeek-671B on MiniF2F.
📚 Leading on MathOlympiadBench (IMO-level problems)
* Solves 73 vs 50 over 671B DeepSeek Prover

🔓 Website: https://t.co/lLQpltH0ea
🔓 Model 32B: https://t.co/HcIGZPOB9L
🔓 Model 8B https://t.co/yE3Fm3i4Ev
🔓Data and training pipeline will be released soon.
Amazing Collaborators: @sangertang1999 @Lyubh22 @__zrrr__ @juihuichung @thomaszhao1998 @pero733858111 @thiiis_user @EmilyJge @JingruoS5931 @wujiayun12 @GesiJiri68334 @davidjesusacu @KaiyuYang4 @hongzhou__lin @YejinChoinka @danqi_chen @prfsanjeevarora @chijinML

9

263

91

118

98K

ExplainMiracles retweeted

Noam Razin @noamrazin

11 months ago

Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types. 📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs 🧵 1/6

noamrazin's tweet photo. Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types.

📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6 https://t.co/9xsuoqppMF

3

163

18

133

12K

ExplainMiracles retweeted

Shulin Tian

@shulin_tian

12 months ago

🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework for reasoning over ultra-long (i.e., in days and weeks) egocentric videos, with the support from Chain-of-Tool-Thought (CoTT) that decomposes complex reasoning tasks into modular steps. At its core is Ego-R1-Agent-3B, an orchestrating language model trained to dynamically invoke specialized tools at each step, based on the previous actions and observations, to collect the necessary information and solve the tasks gradually, step-by-step. All code and data are fully open-sourced :) 🌐 Project: https://t.co/FabClcYzLr 📄 Paper: https://t.co/YmxFM6eFyV 💻 Code: https://t.co/9MkpxJhsjs

7

37

7

6K

ExplainMiracles retweeted

Xiusi Chen

@xiusi_chen

about 1 year ago

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently. 📑Paper: https://t.co/ItbbRHbjCL 💻Code: https://t.co/74HduQTfMY 🧵👇

xiusi_chen's tweet photo. Can LLMs make rational decisions like human experts?

📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker

We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making scenarios transparently.

📑Paper: https://t.co/ItbbRHbjCL
💻Code: https://t.co/74HduQTfMY

🧵👇

3

54

15

26

8K

ExplainMiracles retweeted

Peixuan Han @peixuanhakhan

about 1 year ago

(1/5) Want to make your LLM a skilled persuader? Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"! For details: 📄Arxiv: https://t.co/680ddg9tVW 🛠️GitHub: https://t.co/iMxSSGvY7D

peixuanhakhan's tweet photo. (1/5) Want to make your LLM a skilled persuader?

Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"!

For details:
📄Arxiv: https://t.co/680ddg9tVW
🛠️GitHub: https://t.co/iMxSSGvY7D https://t.co/HnZY8y33Vk

2

25

7

9

3K

ExplainMiracles retweeted

Cheng Qian

@qiancheng1231

about 1 year ago

📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: https://t.co/8uT99ZtC7M 💻 Code: https://t.co/aryBCh3d0a Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

qiancheng1231's tweet photo. 📢 New Paper Drop: From Solving to Modeling!
LLMs can solve math problems — but can they model the real world? 🌍

📄 arXiv: https://t.co/8uT99ZtC7M
💻 Code: https://t.co/aryBCh3d0a

Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs. https://t.co/lQTtN6UK6z

3

103

31

52

13K

ExplainMiracles retweeted

Hanze Dong

@hendrydong

about 1 year ago

How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance https://t.co/KqJvMTeXbV

5

81

20

45

7K

ExplainMiracles retweeted

Xiusi Chen

@xiusi_chen

about 1 year ago

🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: https://t.co/VxuZ8JuhUJ 💻 Code: https://t.co/R583Hib26g Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RM's interpretability and performance. RM-R1 achieves state-of-the-art or near state-of-the-art performance of generative RMs on RewardBench, RM-Bench and RMB. 🧵👇

xiusi_chen's tweet photo. 🚀 Can we cast reward modeling as a reasoning task?

📖 Introducing our new paper:
RM-R1: Reward Modeling as Reasoning

📑 Paper: https://t.co/VxuZ8JuhUJ
💻 Code: https://t.co/R583Hib26g

Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize and validate that integrating reasoning capabilities into reward modeling significantly enhances RM's interpretability and performance. RM-R1 achieves state-of-the-art or near state-of-the-art performance of generative RMs on RewardBench, RM-Bench and RMB.

🧵👇

3

201

44

115

42K

Jiarui Yao @ExplainMiracles

about 1 year ago

We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs. – Achieves 2–4× faster convergence than RAFT – Improves accuracy on math reasoning benchmarks – Generalizes to reinforcement learning methods such as GRPO – Comes with theoretical convergence guarantees 📄 Paper: https://t.co/3CuExbJANR 🔗 Code: (expected in a few hours) https://t.co/vFO3mWIM8A #LLM #MachineLearning #ReinforcementLearning #ChainOfThought #AIResearch

0

88

27

45

6K

Jiarui Yao

@ExplainMiracles

Last Seen Users on Sotwe

Trends for you

Most Popular Users