Gaotang Li

@GaotangLi

Ph.D. @UofIllinois | Undergrad @UMich. Language Models.

Joined November 2024

354 Following

223 Followers

119 Posts

Pinned Tweet

Gaotang Li

@GaotangLi

27 days ago

🧩 How can we assign fine-grained credit over long tool-use trajectories and let agents learn from past attempts in agentic reinforcement learning when rewards are no longer verifiable? Excited to share RubricEM, an RL framework for long-form deep research agents that plan, search, use tools, and write reports without exact answer checks. 📖 Paper: https://t.co/t1tksq5g30 (1/n)

GaotangLi's tweet photo. 🧩 How can we assign fine-grained credit over long tool-use trajectories and let agents learn from past attempts in agentic reinforcement learning when rewards are no longer verifiable?

Excited to share RubricEM, an RL framework for long-form deep research agents that plan, search, use tools, and write reports without exact answer checks.

📖 Paper: https://t.co/t1tksq5g30

(1/n)

109

16K

GaotangLi retweeted

Arda Göreci

@ArdaGoreci

5 days ago

Excited to share more of our technical notes in the spirit of open-research. Blog 2: How Muon Lost its Geometry As Muon spread from speedruns to LLM-scale, it lost the µP scaling rule that keeps the best LR as width changes. The bug ships in PyTorch and Optax 😱. (1/n)

ArdaGoreci's tweet photo. Excited to share more of our technical notes in the spirit of open-research. Blog 2: How Muon Lost its Geometry

As Muon spread from speedruns to LLM-scale, it lost the µP scaling rule that keeps the best LR as width changes. The bug ships in PyTorch and Optax 😱. (1/n) https://t.co/PPtRgi3cF2

140

119

10K

GaotangLi retweeted

Jiarui Liu

@Jiarui_Liu_

5 days ago

Excited to share that our work 📝 "PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf" has been accepted to #ACL2026 Demo! Most AI writing tools either fix grammar or simulate peer review with a score. Neither gives drafting-stage, text-anchored feedback on narrative, structure and presentation. PaperMentor comments rather than rewrites: It is a human-centered, multi-agent writing tutor that delivers expert-level, actionable feedback as native inline comments right inside Overleaf, while leaving every revision to you. It pairs a curated library of 40+ expert skill files (distilled from senior researchers' writing advice) with 12 specialized agents covering methods, results, formatting, terminology, venue norms and more. In a user study, 90.6% of comments were rated actionable and PaperMentor significantly outperformed a GPT-5.2 baseline without the skill library on both validity and actionability. Anyone can extend or contribute to the skill library with simple text edits! 📝 Arxiv link: https://t.co/0STd6kasBw 🔗 Live demo: https://t.co/4MNrmxyE0U 💻 Code with skill library: https://t.co/iKKQ1orecq 🧵 How it works below 👇

Jiarui_Liu_'s tweet photo. Excited to share that our work 📝 "PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf" has been accepted to #ACL2026 Demo!

Most AI writing tools either fix grammar or simulate peer review with a score. Neither gives drafting-stage, text-anchored feedback on narrative, structure and presentation.

PaperMentor comments rather than rewrites: It is a human-centered, multi-agent writing tutor that delivers expert-level, actionable feedback as native inline comments right inside Overleaf, while leaving every revision to you.

It pairs a curated library of 40+ expert skill files (distilled from senior researchers' writing advice) with 12 specialized agents covering methods, results, formatting, terminology, venue norms and more.

In a user study, 90.6% of comments were rated actionable and PaperMentor significantly outperformed a GPT-5.2 baseline without the skill library on both validity and actionability.

Anyone can extend or contribute to the skill library with simple text edits!

📝 Arxiv link: https://t.co/0STd6kasBw
🔗 Live demo: https://t.co/4MNrmxyE0U
💻 Code with skill library: https://t.co/iKKQ1orecq

🧵 How it works below 👇

114

116

15K

GaotangLi retweeted

Akshat Gupta

@akshatgupta57

7 days ago

We spend a lot of time anthropomorphizing AI. I think the more interesting question is the reverse: what do humans look like when viewed through the lens of AI? I tried to unpack this idea in my new blog post: Un-Anthropomorphizing Humans What do humans look like if we describe them as compute, model, learning loop, and an energy problem? Here's the link - https://t.co/RUBJiAvvjP

akshatgupta57's tweet photo. We spend a lot of time anthropomorphizing AI.

I think the more interesting question is the reverse: what do humans look like when viewed through the lens of AI?

I tried to unpack this idea in my new blog post: Un-Anthropomorphizing Humans

What do humans look like if we describe them as compute, model, learning loop, and an energy problem?

Here's the link - https://t.co/RUBJiAvvjP

GaotangLi retweeted

Dylan Zhang

@dylan_works_

9 days ago

A real scientist doesn't look up how the world works — they intervene, observe, and revise until a theory holds for a case they've never seen. CausaLab drops an LLM agent into a lab where memorized facts are useless ("Quantum Crystals on Planet X") and asks for the same. https://t.co/uAIafaVXbW

142

109

68K

GaotangLi retweeted

Cheng Qian

@qiancheng1231

12 days ago

Call For Papers for Lifelong Agent @ COLM 2026! 🚀After our first Lifelong Agent Workshop at ICLR 2026, we’re excited to announce that the Lifelong Agent Workshop is back for its 2nd edition! 🎉This time, we’ll be at COLM 2026 this October in San Francisco! We will continue the conversation around lifelong agents: long-term learning, continual alignment, self-evolution, and stable growth, with the goal of exploring more sustainable paths for future agent systems. We’re also thrilled to have an amazing lineup of speakers, organizers, and advisory board members from Stanford, UC Berkeley, CMU, Microsoft Research, Google DeepMind, McGill, Oxford, UIUC, and many other institutions. We warmly welcome submissions to the workshop. Submissions are non-archival! Topics include, but are not limited to: agent post-training, agent RL, user-agent alignment, self-evolving agents, embodied agents, Agent4Science, applications, benchmarks, and more. 📌 Submission deadline: July 3, 2026 AoE 🌐 Workshop homepage: https://t.co/CSSUQ5Flx8 📖 OpenReview: https://t.co/OZxc9lO9nF 📮 Contact: [email protected] Looking forward to your submissions and to seeing everyone at COLM 2026 in San Francisco! 🚀 #COLM2026 #AIAgents #AgenticAI #LLM #LifelongLearning #AIAlignment

qiancheng1231's tweet photo. Call For Papers for Lifelong Agent @ COLM 2026!

🚀After our first Lifelong Agent Workshop at ICLR 2026, we’re excited to announce that the Lifelong Agent Workshop is back for its 2nd edition!

🎉This time, we’ll be at COLM 2026 this October in San Francisco!

We will continue the conversation around lifelong agents: long-term learning, continual alignment, self-evolution, and stable growth, with the goal of exploring more sustainable paths for future agent systems.

We’re also thrilled to have an amazing lineup of speakers, organizers, and advisory board members from Stanford, UC Berkeley, CMU, Microsoft Research, Google DeepMind, McGill, Oxford, UIUC, and many other institutions.

We warmly welcome submissions to the workshop. Submissions are non-archival!

Topics include, but are not limited to:
agent post-training, agent RL, user-agent alignment, self-evolving agents, embodied agents, Agent4Science, applications, benchmarks, and more.

📌 Submission deadline: July 3, 2026 AoE
🌐 Workshop homepage: https://t.co/CSSUQ5Flx8
📖 OpenReview: https://t.co/OZxc9lO9nF
📮 Contact: lifelongagents@googlegroups.com

Looking forward to your submissions and to seeing everyone at COLM 2026 in San Francisco! 🚀

#COLM2026 #AIAgents #AgenticAI #LLM #LifelongLearning #AIAlignment

203

91K

GaotangLi retweeted

Cheng Qian

@qiancheng1231

18 days ago

✨ Creativity is not just recognizing what an object is — it is imagining what it could become. 🔧 A key edge can cut tape. 🛡️ A rubber pad can protect a wall. 🪮 A comb guard can clear a sink slot. But can multimodal AI agents discover these hidden physical affordances from images? 🚀We introduce MM-CreativityBench, a benchmark designed to test whether LMMs can creatively repurpose everyday objects by interactively inspecting scenes, entities, and object parts. 🔍 Our findings show that today’s LMMs often identify the right object, but fail to ground their reasoning in the right part. They hallucinate properties, overlook physical constraints, or propose solutions that are not mechanically valid. 🧠 To move beyond plausible guesses, we propose affordance-grounded alignment: training models to explore visual evidence, reject hallucinated affordances, and reason from geometry, material, and mechanics. 📄 Paper: https://t.co/DW6J06yPHK 🌐 Project: https://t.co/KMDTLKaa0r 💻 Code: https://t.co/4L3LYObPZX 🤗 Hugging Face: https://t.co/XPmrfP0Gie

qiancheng1231's tweet photo. ✨ Creativity is not just recognizing what an object is — it is imagining what it could become.

🔧 A key edge can cut tape.
🛡️ A rubber pad can protect a wall.
🪮 A comb guard can clear a sink slot.

But can multimodal AI agents discover these hidden physical affordances from images?

🚀We introduce MM-CreativityBench, a benchmark designed to test whether LMMs can creatively repurpose everyday objects by interactively inspecting scenes, entities, and object parts.

🔍 Our findings show that today’s LMMs often identify the right object, but fail to ground their reasoning in the right part. They hallucinate properties, overlook physical constraints, or propose solutions that are not mechanically valid.

🧠 To move beyond plausible guesses, we propose affordance-grounded alignment: training models to explore visual evidence, reject hallucinated affordances, and reason from geometry, material, and mechanics.

📄 Paper: https://t.co/DW6J06yPHK
🌐 Project: https://t.co/KMDTLKaa0r
💻 Code: https://t.co/4L3LYObPZX
🤗 Hugging Face: https://t.co/XPmrfP0Gie

Gaotang Li

@GaotangLi

20 days ago

Memory \approx reasoning? Low-rank weight matrices seem to be the key for generalization.

Vaishnavh Nagarajan @_vaishnavh

21 days ago

Updated our paper on the foundations of memory in sequence models (with fresh insights, clearer writing and ablations). Our paper contrasts two distinct ways in which language models memorize and formulates the questions that arise from this. Will be presented at #ICML.

_vaishnavh's tweet photo. Updated our paper on the foundations of memory in sequence models (with fresh insights, clearer writing and ablations).

Our paper contrasts two distinct ways in which language models memorize and formulates the questions that arise from this.

Will be presented at #ICML. https://t.co/6VnogYZS2z

109

12K

241

Gaotang Li

@GaotangLi

25 days ago

Chunyuan has some really good research taste, don’t hesitate to take a look if you are interested in loop transformers!

Chunyuan Deng

@ChunyuanDeng

25 days ago

What is the key bottleneck to scaling looped transformers (LT)? A major challenge is their speed: the looped operation is coupled w/ full quadratic attention. More loop, more powerful, but much slower. Introducing LT2: linear-time looped transformers that loop over linear attention and sparse attention. Linear and sparse attention give the loop speed, making it a fast loop. The loop, in turn, gives linear attention iterative control over its recurrent memory and recursively enlarges the receptive field for sparse attention. Fast attention accelerating the loop, the loop enriching attention, making LT2 a pareto-frontier architecture compared to standard looped transformers. This is a large paper. We did careful ablations in pretraining to find the best architecture, and we used this architecture to distill a hybrid looped transformer, Ouro-hybrid-1.4B, to deliver both industry-level performance and fast inference speed. To read more: Paper: https://t.co/q1YeQnXjoS Code: https://t.co/Dk7C4hBrXk Project: https://t.co/jrM1IquIA4 Model: https://t.co/LQmcCsLP1T

ChunyuanDeng's tweet photo. What is the key bottleneck to scaling looped transformers (LT)? A major challenge is their speed: the looped operation is coupled w/ full quadratic attention. More loop, more powerful, but much slower.

Introducing LT2: linear-time looped transformers that loop over linear attention and sparse attention. Linear and sparse attention give the loop speed, making it a fast loop. The loop, in turn, gives linear attention iterative control over its recurrent memory and recursively enlarges the receptive field for sparse attention. Fast attention accelerating the loop, the loop enriching attention, making LT2 a pareto-frontier architecture compared to standard looped transformers.

This is a large paper. We did careful ablations in pretraining to find the best architecture, and we used this architecture to distill a hybrid looped transformer, Ouro-hybrid-1.4B, to deliver both industry-level performance and fast inference speed. To read more:

Paper: https://t.co/q1YeQnXjoS
Code: https://t.co/Dk7C4hBrXk
Project: https://t.co/jrM1IquIA4
Model: https://t.co/LQmcCsLP1T

348

298

51K

Gaotang Li

@GaotangLi

26 days ago

Check out this amazing survey led by Xuying! This is a very timely work and offers substantial insights to the community! 💯

Xuying Ning

@krystal_ning

27 days ago

Thanks for sharing our survey! We are also maintaining an Awesome Code as Agent Harness Papers repo for recent work on code-centric agentic systems and harness engineering: https://t.co/hBX3wv1Pzo

21K

244

Gaotang Li

@GaotangLi

26 days ago

@maximelabonne Feel free to take a glance at our paper, where we discussed why losses like DFT can work and when it doesn’t work: https://t.co/QiALghsXOf

319

GaotangLi retweeted

Jiarui Liu

@Jiarui_Liu_

27 days ago

Excited to share our new paper 🧵MIXSD: Mixed Contextual Self-Distillation for Knowledge Injection Supervised fine-tuning is the common way to teach LLMs new knowledge, but it often catastrophically forgets existing capabilities. We introduce MixSD: a simple, external-teacher-free method to inject knowledge with far less forgetting. 📄https://t.co/qRpaTiI9EU Why does SFT forget? Targets written by humans or external systems diverge from the model's own autoregressive distribution, forcing the optimizer to imitate low-probability tokens. That's what drags pretrained capabilities down. MixSD: We hypothesize that keeping supervision close to the model's own distribution is key to avoiding forgetting. Instead of training on fixed, externally authored targets, at every token we mix between two conditionals of the base model itself: an expert conditional that sees the injected fact in context, and a naive conditional reflecting the model's prior. The result is supervision the model already finds high-probability, while still carrying the new factual signal. A Bernoulli rate λ controls the balance between memorization and retention. Findings: SFT only retains as little as 1% of held-out capability. MixSD retains far more, up to ~100% on larger models, with near-perfect training accuracy. It also beats on-policy self-distillation at a fraction of the compute, and holds across Qwen3 1.7B, 4B, 8B and Llama-3.2.

$Jiarui_Liu_'s tweet photo. Excited to share our new paper 🧵MIXSD: Mixed Contextual Self-Distillation for Knowledge Injection Supervised fine-tuning is the common way to teach LLMs new knowledge, but it often catastrophically forgets existing capabilities. We introduce MixSD: a simple, external-teacher-free method to inject knowledge with far less forgetting. 📄https://t.co/qRpaTiI9EU Why does SFT forget? Targets written by humans or external systems diverge from the model's own autoregressive distribution, forcing the optimizer to imitate low-probability tokens. That's what drags pretrained capabilities down. MixSD: We hypothesize that keeping supervision close to the model's own distribution is key to avoiding forgetting. Instead of training on fixed, externally authored targets, at every token we mix between two conditionals of the base model itself: an expert conditional that sees the injected fact in context, and a naive conditional reflecting the model's prior. The result is supervision the model already finds high-probability, while still carrying the new factual signal. A Bernoulli rate λ controls the balance between memorization and retention. Findings: SFT only retains as little as 1% of held-out capability. MixSD retains far more, up to ~100% on larger models, with near-perfect training accuracy. It also beats on-policy self-distillation at a fraction of the compute, and holds across Qwen3 1.7B, 4B, 8B and Llama-3.2.$

116

16K

GaotangLi retweeted

Tianxin Wei

@wei_tianxin

27 days ago

🚀Code as Agent Harness: A survey work from UIUC, Stanford, and Meta. 📄https://t.co/YReL1BMIoN Code is no longer just the output of AI. It is becoming the executable, inspectable, and stateful substrate through which AI agents reason, act, verify, remember, and self-correct over long horizons. In our new survey, we examine this shift through the lens of Code as Agent Harness, focusing on how code serves as: • 🧠 Harness Interface: coding for reasoning, acting, and environment modeling • ⚙️ Harness Mechanisms: planning, memory, tool use, feedback, and optimization • 🤝 Multi-Agent Harnesses: collaboration through shared code, tests, and execution traces We review applications spanning: 💻 Coding Agents 🖥️ GUI/OS Agents 🤖 Embodied Agents 🔬 Scientific Discovery 🏢 Enterprise Workflows If you find this survey helpful, feel free to explore our resource collection below. 🤗 Hugging Face Daily: https://t.co/cfuoQfzcj3 💻 GitHub: https://t.co/ZD156rWPbJ 🌍 Website: https://t.co/OfibqKL1en Feedback, suggestions, and community contributions are warmly welcome! #AI #Agents #LLM #Coding #AgenticAI #SoftwareEngineering

wei_tianxin's tweet photo. 🚀Code as Agent Harness: A survey work from UIUC, Stanford, and Meta.

📄https://t.co/YReL1BMIoN

Code is no longer just the output of AI.

It is becoming the executable, inspectable, and stateful substrate through which AI agents reason, act, verify, remember, and self-correct over long horizons.

In our new survey, we examine this shift through the lens of Code as Agent Harness, focusing on how code serves as:

• 🧠 Harness Interface: coding for reasoning, acting, and environment modeling
• ⚙️ Harness Mechanisms: planning, memory, tool use, feedback, and optimization
• 🤝 Multi-Agent Harnesses: collaboration through shared code, tests, and execution traces

We review applications spanning:
💻 Coding Agents
🖥️ GUI/OS Agents
🤖 Embodied Agents
🔬 Scientific Discovery
🏢 Enterprise Workflows

If you find this survey helpful, feel free to explore our resource collection below.

🤗 Hugging Face Daily: https://t.co/cfuoQfzcj3
💻 GitHub: https://t.co/ZD156rWPbJ
🌍 Website: https://t.co/OfibqKL1en

Feedback, suggestions, and community contributions are warmly welcome!

#AI #Agents #LLM #Coding #AgenticAI #SoftwareEngineering

Gaotang Li

@GaotangLi

27 days ago

A big shout-out to our amazing collaborators ❤️ @bhavana_dalvi @ZifengWang315 @jun_yannn @anmourchen @chunliang_tw Long T. Le @HanRujun George Lee @hanghangtong @chl260 @tomaspfister (14/n)

174

Gaotang Li

@GaotangLi

27 days ago

109

16K

Gaotang Li

@GaotangLi

27 days ago

Takeaway: Agent and judge co-evolve through rubrics during RL. RubricEM is not just using rubrics to score answers. It treats rubrics as the interface for a coupled evolution loop: Stronger agents generate better self-directed rubrics and expose more informative on-policy failures. The judge distills these rollouts into sharper stagewise criteria. Accepted reflections return to the agent as reusable memory. The agent evolves through policy updates and rubric-bank updates. The judge evolves through an on-policy rubric buffer. For open-ended RL, rubrics become the scaffold for acting, the language for judging, and the memory for evolving. 📖 Paper: https://t.co/t1tksq5g30 (13/n)

141

Gaotang Li

@GaotangLi

Last Seen Users on Sotwe

Trends for you

Most Popular Users