Samuel (Min-Hsuan) Yeh

1 day ago

Excited to share our #ACL2026 paper: "How Retrieved Context Shapes Internal Representations in RAG" 🧵 w/ @Samuel861025 RAG is everywhere, but I am often puzzled by what happens inside the model when we stuff retrieved documents into the context. Did it integrate the evidence? Ignore it? Get confused by it? To answer these questions, we need to look at what's happening inside. We built a controlled framework to study how the hidden states of LLMs change under different retrieval conditions: relevant, distracting, and random documents. 🤔 The counterintuitive finding: Relevant documents barely move the needle on internal representations. They largely reinforce what the model already knows rather than injecting decisive new information. Meanwhile, random, irrelevant documents cause large representation drift -- far larger than relevant or even distracting ones. Why? Because LLMs internally recognize uninformative context and shift into a refusal mode. That representation drift is tightly linked to abstention behavior. The model is essentially saying: "I see this context, and I know it's useless." Other key insights: → Later transformer layers increasingly prioritize parametric knowledge over retrieved evidence, limiting how much external context can influence generation. → In multi-document settings, a single relevant document can anchor the entire representation and suppress surrounding noise. This is good news for imperfect retrieval in practice → Distracting documents (semantically similar but unhelpful) are the silent threat: they don't trigger the model's refusal mechanism the way random docs do, but they quietly degrade quality The takeaway for practitioners: understanding RAG can't stop at output accuracy. The internal dynamics tell a richer and sometimes alarming story about when your system is actually integrating evidence vs. when it's ignoring it, fighting it, or shutting down because of it. We hope our analysis offers mechanistic insights for more principled and reliable RAG system design. 📄 https://t.co/EYEYAo1CcK

SharonYixuanLi's tweet photo. Excited to share our #ACL2026 paper: "How Retrieved Context Shapes Internal Representations in RAG" 🧵
w/ @Samuel861025

RAG is everywhere, but I am often puzzled by what happens inside the model when we stuff retrieved documents into the context. Did it integrate the evidence? Ignore it? Get confused by it? To answer these questions, we need to look at what's happening inside.

We built a controlled framework to study how the hidden states of LLMs change under different retrieval conditions: relevant, distracting, and random documents.

🤔 The counterintuitive finding: Relevant documents barely move the needle on internal representations. They largely reinforce what the model already knows rather than injecting decisive new information. Meanwhile, random, irrelevant documents cause large representation drift -- far larger than relevant or even distracting ones.

Why? Because LLMs internally recognize uninformative context and shift into a refusal mode. That representation drift is tightly linked to abstention behavior. The model is essentially saying: "I see this context, and I know it's useless."

Other key insights:
→ Later transformer layers increasingly prioritize parametric knowledge over retrieved evidence, limiting how much external context can influence generation.

→ In multi-document settings, a single relevant document can anchor the entire representation and suppress surrounding noise. This is good news for imperfect retrieval in practice

→ Distracting documents (semantically similar but unhelpful) are the silent threat: they don't trigger the model's refusal mechanism the way random docs do, but they quietly degrade quality

The takeaway for practitioners: understanding RAG can't stop at output accuracy. The internal dynamics tell a richer and sometimes alarming story about when your system is actually integrating evidence vs. when it's ignoring it, fighting it, or shutting down because of it.

We hope our analysis offers mechanistic insights for more principled and reliable RAG system design.

📄 https://t.co/EYEYAo1CcK

Samuel861025 retweeted

Changdae Oh ✈️ ACL 2026

@Changdae_Oh

2 days ago

Outcome reward models: cheap, but vulnerable to spurious shortcuts 😣 Process reward models (PRMs): robust, but too expensive to build from scratch 😫 What if you could get a ready-to-use PRM right after any RL post-training? Introducing 'Progress Advantage' 🧵

134

138

19K

Samuel861025 retweeted

10 days ago

Agent RL training can be fragile and far less stable than reasoning RL. In our latest work, we identify and explain a phenomenon called Cyclical Entropy Eruption: a recurring instability unique to agent RL where entropy erupts, recovers, and erupts again throughout training. 🧵 📄 https://t.co/ww3M5RHO7s (led by @Wendi_Li_ and @shawnim00) We decompose it into three phases: Phase 1 Entropy Descent. The model first learns the basics: how to call tools, satisfy schema constraints, and use the right format tokens. Probability mass shifts rapidly from invalid outputs to valid trajectories. Entropy drops fast. Phase 2 Entropy Eruption. The core instability. Correct and incorrect agent trajectories end up extremely close in representation space, far more overlapping than in non-agent tasks. This causes gradient interference: when RL suppresses bad trajectories, it accidentally drags down the likelihood of good ones too. The policy flattens, entropy spikes, and degenerate patterns like sentence duplication and hallucination emerge. Phase 3 Entropy Subsidence. The flatter distribution leads to more diverse sampling, which reduces representation similarity and eases interference. Training recovers... but then reconverges, similarity rises again, and the next eruption begins. A self-perpetuating cycle. The damage compounds. Degenerate patterns acquired during eruption persist and accumulate across cycles. In the worst case (Llama3.2-1B on WebShop), a single eruption triggers complete training collapse. Motivated by our analysis, we propose SEAL (Separation-Enhanced Agent Learning)--a lightweight auxiliary loss that pushes correct and incorrect trajectories apart in representation space, directly targeting the root cause. SEAL stabilizes training and improves performance across AlfWorld, WebShop, and search-augmented QA, on both Qwen and Llama backbones, with GRPO and GIGPO. On a Llama run where vanilla GRPO completely collapsed (0% success), adding SEAL recovered performance to ~80%. 💡The broader takeaway: agent RL is a fundamentally different optimization problem than reasoning RL. The multi-turn structure, tool interactions, and validity constraints create training dynamics that deserve their own analysis. We hope this work helps the community build more stable agent post-training pipelines. Code is available in the paper!

SharonYixuanLi's tweet photo. Agent RL training can be fragile and far less stable than reasoning RL. In our latest work, we identify and explain a phenomenon called Cyclical Entropy Eruption: a recurring instability unique to agent RL where entropy erupts, recovers, and erupts again throughout training. 🧵

📄 https://t.co/ww3M5RHO7s (led by @Wendi_Li_ and @shawnim00)

We decompose it into three phases:

Phase 1 Entropy Descent. The model first learns the basics: how to call tools, satisfy schema constraints, and use the right format tokens. Probability mass shifts rapidly from invalid outputs to valid trajectories. Entropy drops fast.

Phase 2 Entropy Eruption. The core instability. Correct and incorrect agent trajectories end up extremely close in representation space, far more overlapping than in non-agent tasks. This causes gradient interference: when RL suppresses bad trajectories, it accidentally drags down the likelihood of good ones too. The policy flattens, entropy spikes, and degenerate patterns like sentence duplication and hallucination emerge.

Phase 3 Entropy Subsidence. The flatter distribution leads to more diverse sampling, which reduces representation similarity and eases interference. Training recovers... but then reconverges, similarity rises again, and the next eruption begins. A self-perpetuating cycle.

The damage compounds. Degenerate patterns acquired during eruption persist and accumulate across cycles. In the worst case (Llama3.2-1B on WebShop), a single eruption triggers complete training collapse.

Motivated by our analysis, we propose SEAL (Separation-Enhanced Agent Learning)--a lightweight auxiliary loss that pushes correct and incorrect trajectories apart in representation space, directly targeting the root cause.

SEAL stabilizes training and improves performance across AlfWorld, WebShop, and search-augmented QA, on both Qwen and Llama backbones, with GRPO and GIGPO. On a Llama run where vanilla GRPO completely collapsed (0% success), adding SEAL recovered performance to ~80%.

💡The broader takeaway: agent RL is a fundamentally different optimization problem than reasoning RL. The multi-turn structure, tool interactions, and validity constraints create training dynamics that deserve their own analysis. We hope this work helps the community build more stable agent post-training pipelines.

Code is available in the paper!

164

111

51K

Samuel861025 retweeted

an NLP researcher @SBIntuitions @Softbank. Views my own. Previously researcher @Sony, phd @CisLmu. #nlproc

19 days ago

Best-of-N sampling is often used to boost LLM performance, but the selection relies on external evaluators, adding cost and bias. What if you could select the best output without any external scoring at all? Introducing our #ACL2026 paper ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation! (led by Hyeong Kyu Choi @HyeonggyuC) 💡 Our key insight: among multiple LLM generations, high-quality outputs tend to cluster together semantically. The best answer is the modal one: the generation that captures the dominant consensus. How ModeX works: 1⃣ Build a similarity graph over N candidate generations 2⃣ Recursively apply spectral clustering via the Fiedler vector to isolate the dominant semantic cluster 3⃣ Select the centroid of that cluster as the final output No reward models. No external evaluators. No auxiliary inference. Just the texts themselves. 📊 Results across text summarization (CNN/DailyMail), code generation (HumanEval), and math reasoning (Math-500) show ModeX consistently outperforms single-path and multi-path baselines, achieving state-of-the-art among evaluator-free methods. We also provide theoretical justifications connecting our graph-based mode selection to kernel density estimation, grounding the approach with principled foundations. 📄 Paper: https://t.co/QfPXgqmUHF 💻 Code: https://t.co/YdCpupdnMp Sometimes the best signal is already hiding in the samples; you just need to find the mode. 🎯

SharonYixuanLi's tweet photo. Best-of-N sampling is often used to boost LLM performance, but the selection relies on external evaluators, adding cost and bias. What if you could select the best output without any external scoring at all?

Introducing our #ACL2026 paper ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation! (led by Hyeong Kyu Choi @HyeonggyuC)

💡 Our key insight: among multiple LLM generations, high-quality outputs tend to cluster together semantically. The best answer is the modal one: the generation that captures the dominant consensus.

How ModeX works:
1⃣ Build a similarity graph over N candidate generations
2⃣ Recursively apply spectral clustering via the Fiedler vector to isolate the dominant semantic cluster
3⃣ Select the centroid of that cluster as the final output

No reward models. No external evaluators. No auxiliary inference. Just the texts themselves.

📊 Results across text summarization (CNN/DailyMail), code generation (HumanEval), and math reasoning (Math-500) show ModeX consistently outperforms single-path and multi-path baselines, achieving state-of-the-art among evaluator-free methods.

We also provide theoretical justifications connecting our graph-based mode selection to kernel density estimation, grounding the approach with principled foundations.

📄 Paper: https://t.co/QfPXgqmUHF
💻 Code: https://t.co/YdCpupdnMp

Sometimes the best signal is already hiding in the samples; you just need to find the mode. 🎯

130

Who to follow

Mengjie Zhao

@mengjie_zhao

Adharsh Kamath

@adharshkamath

PhD student @siebelschool | Previously @MSFTResearch @surathkal_nitk | PL/FM/SE, AI4Code

Hao Peng

@haopeng_uiuc

Assistant Professor @ UIUC CS | PhD from UW | Formerly @allen_ai, @GoogleDeepMind, @Google

Samuel861025 retweeted

Seongheon Park ✈️ICML 2026 @seongheon_96

28 days ago

🤖 How do you detect VLA failures during execution with only trajectory-level labels? Excited to share our new paper: "Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring"

seongheon_96's tweet photo. 🤖 How do you detect VLA failures during execution with only trajectory-level labels?

Excited to share our new paper:
"Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring" https://t.co/RhHLYURZXw

2 months ago

I will present two papers at #ICLR this week! 1. HalluEntity: Benchmarking and Understanding Entity-Level Hallucination Detection 4/24 Fri 10:00-13:00 P4-#3901 2. LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals 4/25 Sat 15:15-17:45 P4-#3816

Samuel861025's tweet photo. I will present two papers at #ICLR this week!

1. HalluEntity: Benchmarking and Understanding Entity-Level Hallucination Detection
4/24 Fri 10:00-13:00 P4-#3901

2. LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals
4/25 Sat 15:15-17:45 P4-#3816 https://t.co/SyaRtqIxz8

Samuel861025 retweeted

2 months ago

Heading to #ICLR2026? My lab will be presenting a few papers there, from detecting LLM hallucinations (LUMINA, HalluEntity) to mechanistic interpretability, RL post-training, LLM deception, and more. Huge congrats to all students and co-authors! 🎉 @Changdae_Oh @shawnim00 @Wendi_Li_ @LeitianT @Samuel861025 @seongheon_96 @jd92wang @Abell_Zhen_Fang @xuanmingzhangai @jwaladhamala @rahul1987iit @tanwimallick @Katherine_Luo6 @YangXu_09

SharonYixuanLi's tweet photo. Heading to #ICLR2026? My lab will be presenting a few papers there, from detecting LLM hallucinations (LUMINA, HalluEntity) to mechanistic interpretability, RL post-training, LLM deception, and more.

Huge congrats to all students and co-authors! 🎉
@Changdae_Oh @shawnim00 @Wendi_Li_ @LeitianT @Samuel861025 @seongheon_96 @jd92wang @Abell_Zhen_Fang @xuanmingzhangai @jwaladhamala @rahul1987iit @tanwimallick @Katherine_Luo6 @YangXu_09

Samuel861025 retweeted

2 months ago

Your LLM agent just mass-deleted a production database because it was confident it understood the task. It didn't. Avoiding these irreversible mistakes requires uncertainty quantification, a pressing open problem in the era of LLM agents. Check out our #ACL2026 paper: "Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities" 🔍 Why this matters: LLM agents now book flights, modify databases, and execute code autonomously. Yet most UQ research still measures a single-turn QA setup. In contrast, agents follow multi-turn trajectories in which they interact with users, call tools, and receive environmental feedback. The gap between how we study UQ and how agents actually operate is enormous. ⚙️ A unified formulation: We present the first unified formulation of Agent UQ. It models the full trajectory (actions, observations, states) and decomposes uncertainty per turn via the chain rule. Under this formulation, single-step LLM UQ and multi-step reasoning UQ fall out as special cases. 🚧 Challenges: We identify four core challenges: from selecting the right UQ estimator when existing methods all break down in agentic settings to handling heterogeneous uncertainty sources (user, tools, environment) to the near-total lack of fine-grained agent benchmarks (we survey 44 and find that turn-level evaluation is extremely rare). 🌍 Implications and open problems: Agent UQ is the missing safety layer for healthcare agents triaging patients, SWE agents pushing code to prod, and agents controlling cyber-physical systems. We also surface open problems around solution multiplicity, multi-agent UQ, and self-evolving systems. We release code and data to help the community build on this. 📄 Paper: https://t.co/IrvVUa3JgY 🌐 Project: https://t.co/LkDaCZCZep 💻 Code: https://t.co/6juIEPaLqJ Huge shoutout to @changdaeoh, who spearheaded this effort. When we started the work, agent UQ was a loosely defined space with scattered ideas; Changdae brought the clarity, structure, and rigor that the field needed to move forward. Also thanks to all the collaborators: @seongheon_96 , To Eun Kim, @JiatongLi0418, @Wendi_Li_ , @Samuel861025 @xuefeng_du, Hamed Hassani, Paul Bogdan, Dawn Song

SharonYixuanLi's tweet photo. Your LLM agent just mass-deleted a production database because it was confident it understood the task. It didn't. Avoiding these irreversible mistakes requires uncertainty quantification, a pressing open problem in the era of LLM agents.

Check out our #ACL2026 paper:
"Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities"

🔍 Why this matters:
LLM agents now book flights, modify databases, and execute code autonomously. Yet most UQ research still measures a single-turn QA setup. In contrast, agents follow multi-turn trajectories in which they interact with users, call tools, and receive environmental feedback. The gap between how we study UQ and how agents actually operate is enormous.

⚙️ A unified formulation:
We present the first unified formulation of Agent UQ. It models the full trajectory (actions, observations, states) and decomposes uncertainty per turn via the chain rule. Under this formulation, single-step LLM UQ and multi-step reasoning UQ fall out as special cases.

🚧 Challenges:
We identify four core challenges: from selecting the right UQ estimator when existing methods all break down in agentic settings to handling heterogeneous uncertainty sources (user, tools, environment) to the near-total lack of fine-grained agent benchmarks (we survey 44 and find that turn-level evaluation is extremely rare).

🌍 Implications and open problems:
Agent UQ is the missing safety layer for healthcare agents triaging patients, SWE agents pushing code to prod, and agents controlling cyber-physical systems. We also surface open problems around solution multiplicity, multi-agent UQ, and self-evolving systems.

We release code and data to help the community build on this.

📄 Paper: https://t.co/IrvVUa3JgY
🌐 Project: https://t.co/LkDaCZCZep
💻 Code: https://t.co/6juIEPaLqJ

Huge shoutout to @changdaeoh, who spearheaded this effort. When we started the work, agent UQ was a loosely defined space with scattered ideas; Changdae brought the clarity, structure, and rigor that the field needed to move forward.

Also thanks to all the collaborators: @seongheon_96 , To Eun Kim, @JiatongLi0418, @Wendi_Li_ , @Samuel861025 @xuefeng_du, Hamed Hassani, Paul Bogdan, Dawn Song

247

169

16K

Samuel861025 retweeted

Stephan Rabanser @steverab

3 months ago

We've been in GRPO-tweaking mode for months (entropy bonuses, clipping hacks, length penalties). But what if the entire objective is wrong? Today, we're releasing LAD (Learning Advantage Distributions), the most elegant rethink of RL for LLM reasoning I've seen this year. #ACL2026 Here's the idea, how it works, and why we think it changes things. 🧵 The problem we kept hitting GRPO, DAPO, RLOO, and many other variants do the same thing at their core: maximize expected reward. And when you do that, your policy can collapse onto a single dominant reasoning path. Entrop regularization can act as a bolt onto the framework, but it doesn't fundamentally fix it from the ground up. The key insight 💡Stop maximizing. Start matching. We reframe the policy update as a distribution matching problem. Instead of pushing toward the single best response, we make the policy's output distribution match the full advantage-weighted target distribution by minimizing an f-divergence between the two (see our theory in Section 3.1). When you match the full advantage distribution, you naturally preserve probability mass across multiple valid reasoning paths. High-advantage responses get upweighted, yes, but the objective also suppresses overconfident probability growth on any single mode. Collapse prevention isn't an afterthought. What validated the theory We tested six divergence families. The result that convinced us we were on the right track: - Strict divergences (Total Variation, Hellinger, Jensen-Shannon) that enforce exact distributional matching consistently outperform weaker ones (such as KL). - The more faithfully you learn the full advantage distribution, the better the reasoning. This is exactly what the framework predicts. The results - In a controlled bandit setting. LAD recovers multiple-mode advantage distributions (see plot below). GRPO fundamentally cannot. This is the clearest demonstration that the paradigm difference is real, not just theoretical - In math and code reasoning tasks across multiple LLM backbones. LAD consistently outperforms GRPO on both accuracy AND generative diversity across benchmarks. Why this matters beyond benchmarks Pass@k scaling: If your model knows 5 valid reasoning paths instead of 1, sampling at inference becomes massively more effective. Simplicity: Instead of stacking "GRPO + entropy hack," you get one principled objective. Diversity preservation comes by design. Paper: https://t.co/Vs8TpzjiGH Code is available; link in the paper. Huge credit to my amazing student @Wendi_Li_, who drove this work, thinks boldly, and made things happen.

SharonYixuanLi's tweet photo. We've been in GRPO-tweaking mode for months (entropy bonuses, clipping hacks, length penalties). But what if the entire objective is wrong?

Today, we're releasing LAD (Learning Advantage Distributions), the most elegant rethink of RL for LLM reasoning I've seen this year. #ACL2026

Here's the idea, how it works, and why we think it changes things. 🧵

The problem we kept hitting
GRPO, DAPO, RLOO, and many other variants do the same thing at their core: maximize expected reward. And when you do that, your policy can collapse onto a single dominant reasoning path. Entrop regularization can act as a bolt onto the framework, but it doesn't fundamentally fix it from the ground up.

The key insight
💡Stop maximizing. Start matching.

We reframe the policy update as a distribution matching problem. Instead of pushing toward the single best response, we make the policy's output distribution match the full advantage-weighted target distribution by minimizing an f-divergence between the two (see our theory in Section 3.1).

When you match the full advantage distribution, you naturally preserve probability mass across multiple valid reasoning paths. High-advantage responses get upweighted, yes, but the objective also suppresses overconfident probability growth on any single mode.

Collapse prevention isn't an afterthought.

What validated the theory
We tested six divergence families. The result that convinced us we were on the right track:
- Strict divergences (Total Variation, Hellinger, Jensen-Shannon) that enforce exact distributional matching consistently outperform weaker ones (such as KL).
- The more faithfully you learn the full advantage distribution, the better the reasoning. This is exactly what the framework predicts.

The results
- In a controlled bandit setting. LAD recovers multiple-mode advantage distributions (see plot below). GRPO fundamentally cannot. This is the clearest demonstration that the paradigm difference is real, not just theoretical

- In math and code reasoning tasks across multiple LLM backbones. LAD consistently outperforms GRPO on both accuracy AND generative diversity across benchmarks.

Why this matters beyond benchmarks
Pass@k scaling: If your model knows 5 valid reasoning paths instead of 1, sampling at inference becomes massively more effective.

Simplicity: Instead of stacking "GRPO + entropy hack," you get one principled objective. Diversity preservation comes by design.

Paper: https://t.co/Vs8TpzjiGH
Code is available; link in the paper.

Huge credit to my amazing student @Wendi_Li_, who drove this work, thinks boldly, and made things happen.

375

339

32K

Samuel861025 retweeted

3 months ago

In our paper "Towards a Science of AI Agent Reliability" we put numbers on the capability-reliability gap. Now we're showing what's behind them! We conducted an extensive analysis of failures on GAIA across Claude Opus 4.5, Gemini 2.5 Pro, and GPT 5.4. Here's what we found ⬇️

steverab's tweet photo. In our paper "Towards a Science of AI Agent Reliability" we put numbers on the capability-reliability gap. Now we're showing what's behind them!

We conducted an extensive analysis of failures on GAIA across Claude Opus 4.5, Gemini 2.5 Pro, and GPT 5.4.

Here's what we found ⬇️ https://t.co/GkdAxk0wDO

150

151

35K

Samuel861025 retweeted

Kimi.ai @Kimi_Moonshot

3 months ago

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: https://t.co/u3EHICG05h

Kimi_Moonshot's tweet photo. Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation.

Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers.

🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth.
🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale.
🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead.
🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains.

🔗Full report:
https://t.co/u3EHICG05h

334

13K

10K

Samuel861025 retweeted

Karina

@karinanguyen

4 months ago

Excited to release PostTrainBench v1.0! This benchmark evaluates the ability of frontier AI agents to post-train language models in a simplified setting. We believe this is a first step toward tracking progress in recursive self-improvement 🧵:

730

532

179K

Samuel861025 retweeted

Zihan "Zenus" Wang

@wzenus

4 months ago

In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵

270

216

189K

4 months ago

@pablofmorales Good question! We didn't test it, but it's worth a try!

4 months ago

🧵 Excited to share our new paper at #ICLR2026: LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals With @tanwimallick and @SharonYixuanLi Paper: https://t.co/wQ0xLHI1nI Code: https://t.co/by25osWkfL 1/N

Samuel861025's tweet photo. 🧵 Excited to share our new paper at #ICLR2026: LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals

With @tanwimallick and @SharonYixuanLi

Paper: https://t.co/wQ0xLHI1nI
Code: https://t.co/by25osWkfL

1/N https://t.co/gV5BOAE73H

134

14K

4 months ago

✅ We also statistically validate that our scores actually measure what they claim to measure — not just correlate with hallucination. All 4 hypothesis tests pass across Llama2, Llama3, and Mistral. Check out the paper & code, and let us know what you think! 🙌 7/N

135