CMU Safe AI Lab

Assistant Prof. @Tsinghua_Uni. Postdoc @StanfordSVL. Ph.D. @CarnegieMellon. Prev. @GoogleDeepMind. Learning and Robotics.

about 1 year ago

Struggling with RL fine-tuning for LLMs? Our most recent paper nails the issue: it’s not just about accuracy distribution, it’s about how coherent and influential your rollouts are! BRIDGE proposes a clever fix 👇

Zhepeng Cen @ZhepengCen

about 1 year ago

🚀 Introducing BRIDGE — a task-agnostic data augmentation strategy to prepare LLMs for RL! 🤖 Why do LLMs often fail to benefit from RL fine-tuning? We pinpoint two key factors: 1) 🔍 Rollout Accuracy 2) 🔗 Data Co-Influence. 💡 BRIDGE injects both exploration & exploitation into LLMs, boosting rollout informativeness and increasing data co-influence — key ingredients for effective RL fine-tuning. 🔍 We also compare with a popular accuracy-filtering baseline — it still plateaus due to low data co-influence. TL;DR: medium query difficulty / accuracy ≠ RL-ready. 📄 Paper: https://t.co/98Xb9nmAOL 🌐 Website: https://t.co/YTieEjXA0G

ZhepengCen's tweet photo. 🚀 Introducing BRIDGE — a task-agnostic data augmentation strategy to prepare LLMs for RL!

🤖 Why do LLMs often fail to benefit from RL fine-tuning? We pinpoint two key factors: 1) 🔍 Rollout Accuracy 2) 🔗 Data Co-Influence. 💡 BRIDGE injects both exploration & exploitation into LLMs, boosting rollout informativeness and increasing data co-influence — key ingredients for effective RL fine-tuning.

🔍 We also compare with a popular accuracy-filtering baseline — it still plateaus due to low data co-influence. TL;DR: medium query difficulty / accuracy ≠ RL-ready.

📄 Paper: https://t.co/98Xb9nmAOL
🌐 Website: https://t.co/YTieEjXA0G

1

59

15

26

5K

0

5

1

0

324

cmusafeai retweeted

Zhepeng Cen @ZhepengCen

about 1 year ago

🚀 Introducing BRIDGE — a task-agnostic data augmentation strategy to prepare LLMs for RL! 🤖 Why do LLMs often fail to benefit from RL fine-tuning? We pinpoint two key factors: 1) 🔍 Rollout Accuracy 2) 🔗 Data Co-Influence. 💡 BRIDGE injects both exploration & exploitation into LLMs, boosting rollout informativeness and increasing data co-influence — key ingredients for effective RL fine-tuning. 🔍 We also compare with a popular accuracy-filtering baseline — it still plateaus due to low data co-influence. TL;DR: medium query difficulty / accuracy ≠ RL-ready. 📄 Paper: https://t.co/98Xb9nmAOL 🌐 Website: https://t.co/YTieEjXA0G

1

59

15

26

5K

Who to follow

Mengdi Xu

@mengdixu_

Yaru Niu

@yaru_niu

PhD student @CarnegieMellon. Research Intern @NVIDIAAI. Previously @GeorgiaTech @BaiduResearch @UCBerkeley and SCUT. Building robot intelligence.

liquidity 🇮🇱

@statarbwtf

very based alpha quant 🇮🇱

cmusafeai retweeted

Xilun Zhang

@XilunZhangXZ

over 1 year ago

🤖 What if robots could adapt from simulation to reality on the fly, mastering tasks like scooping objects and playing table air hockey? 🥄🏓 I’m thrilled to share that our work, "Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identification," has been accepted for publication in #IEEE Robotics and Automation Letters! 🎉 👇👇👇 https://t.co/LoUffR2ccB

3

83

14

37

12K

cmusafeai retweeted

over 1 year ago

🔍 Inference-time scaling is a key focus in foundation models, but its origins trace back to model-based RL (MBRL). In MBRL, a critical challenge arises from reconciling the conflict between "world model prediction" and "task reward"—the gap between next-state prediction accuracy and full sequence (user-defined) reward optimization. 🔨 Our NeurIPS paper, BECAUSE, pinpoints the root of this mismatch: spurious correlations in empirical transition dynamics and data policy. To address this, we propose a bilinear causal representation that bridges the gap, enabling generalizable imagination rollouts, robust uncertainty quantification, and pessimistic planning. 📈 BECAUSE achieves state-of-the-art performance, outperforming existing offline RL baselines in generalizable online deployment, with good theoretical guarantees. 🌐 Learn more: https://t.co/HvtGqsYJKW (1/6) #NeurIPS2024 #NeurIPS

8

19

4

6

4K

over 1 year ago

Check our recent works using causal reasoning for generalizable offline RL!

over 1 year ago

🔍 Inference-time scaling is a key focus in foundation models, but its origins trace back to model-based RL (MBRL). In MBRL, a critical challenge arises from reconciling the conflict between "world model prediction" and "task reward"—the gap between next-state prediction accuracy and full sequence (user-defined) reward optimization. 🔨 Our NeurIPS paper, BECAUSE, pinpoints the root of this mismatch: spurious correlations in empirical transition dynamics and data policy. To address this, we propose a bilinear causal representation that bridges the gap, enabling generalizable imagination rollouts, robust uncertainty quantification, and pessimistic planning. 📈 BECAUSE achieves state-of-the-art performance, outperforming existing offline RL baselines in generalizable online deployment, with good theoretical guarantees. 🌐 Learn more: https://t.co/HvtGqsYJKW (1/6) #NeurIPS2024 #NeurIPS

8

19

4

6

4K

0

7

0

391

over 1 year ago

🏝️ OASIS: Shaping the Future of Offline Safe Reinforcement Learning 🚀 Our recent NeurIPS 2024 paper tackles the challenges in Offline Safe Reinforcement Learning with a data-centric perspective that improves training dataset quality for safer and more effective policies. #neurips2024 #Oasis #safety #ReinforcementLearning #Datacuration ✨ Key Contributions: 💡Data-Centric Approach: Focuses on knowledge distillation and data augmentation to enhance dataset quality, rather than solely proposing new model-centric algorithms for sequential modeling and policy optimization. 💡Distribution Shaping: Leverages a conditional diffusion model to curate datasets and align training data with user-defined safety preferences. 💡Theoretical Insights: Provides a comprehensive theoretical analysis of how behavior policy and offline dataset quality affect the performance of regularization-based offline (safe) RL. 💡"Less is More" for Offline Safe RL: Demonstrates that a small, preference-aligned, high-quality dataset can outperform a massive dataset with mixed quality and preferences. 💡Broad Compatibility: Integrates seamlessly with model-centric algorithms. 🔗 Project website: https://t.co/Qv1S4nwwGN 📄 Paper: https://t.co/wd0BpbVdBe 🐙 Code: https://t.co/yBabEVPwyI

cmusafeai's tweet photo. 🏝️ OASIS: Shaping the Future of Offline Safe Reinforcement Learning

🚀 Our recent NeurIPS 2024 paper tackles the challenges in Offline Safe Reinforcement Learning with a data-centric perspective that improves training dataset quality for safer and more effective policies.

#neurips2024 #Oasis #safety #ReinforcementLearning #Datacuration

✨ Key Contributions:

💡Data-Centric Approach: Focuses on knowledge distillation and data augmentation to enhance dataset quality, rather than solely proposing new model-centric algorithms for sequential modeling and policy optimization.

💡Distribution Shaping: Leverages a conditional diffusion model to curate datasets and align training data with user-defined safety preferences.

💡Theoretical Insights: Provides a comprehensive theoretical analysis of how behavior policy and offline dataset quality affect the performance of regularization-based offline (safe) RL.

💡"Less is More" for Offline Safe RL: Demonstrates that a small, preference-aligned, high-quality dataset can outperform a massive dataset with mixed quality and preferences.

💡Broad Compatibility: Integrates seamlessly with model-centric algorithms.

🔗 Project website: https://t.co/Qv1S4nwwGN

📄 Paper: https://t.co/wd0BpbVdBe

🐙 Code: https://t.co/yBabEVPwyI

0

7

2

0

545

cmusafeai retweeted

over 1 year ago

What a pity to miss a wonderful in-person IROS conference! Don’t forget to check our RAL paper: https://t.co/VgxgqvK4kH and our project page! Code base: https://t.co/RhDgzKzG0s Project page: https://t.co/t3VwusoF3M

3

22

6

1

3K

about 2 years ago

Check out our new loco-manipulation project led by @changyi_lin1 and others!

Changyi Lin

@changyi_lin1

about 2 years ago

LocoMan = Quadrupedal Robot + 2 * Loco-Manipulator Powered by dual lightweight 3-DoF Loco-Manipulators and the Whole-Body Controller, LocoMan achieves various challenging tasks, such as manipulation in narrow spaces and bimanual-manipulation. https://t.co/EDPGUxq1sT 👇👇👇

6

228

48

70

63K

0

6

1

0

644

cmusafeai retweeted

over 2 years ago

How can we enhance the reasoning capabilities for RL agents? We present FUSION, a causality-guided trustworthy reinforcement learning framework that can achieve satisfactory performance in safety-critical domains like autonomous vehicle agents under distribution shift. #ML4AD

lin_haohong's tweet photo. How can we enhance the reasoning capabilities for RL agents? We present FUSION, a causality-guided trustworthy reinforcement learning framework that can achieve satisfactory performance in safety-critical domains like autonomous vehicle agents under distribution shift. #ML4AD https://t.co/pa3lNKeN41

1

22

5

0

6K

cmusafeai retweeted

Yaru Niu

@yaru_niu

over 2 years ago

Excited to return to Atlanta for #CoRL2023 @corl_conf after almost 2 years! I'll be presenting "COMPOSER: Scalable and Robust Modular Policies for Snake Robots" at the L4SR workshop's poster sessions. Please check out! 📄🔗: https://t.co/ELGpBgEDJC 🌐🔗: https://t.co/3ns1C8IaWH

2

31

10

5

3K

over 2 years ago

#CoRL23 Nearly everyone in robot learning has dealt with simulators with inaccurate system dynamics, which hinders real-world deployments. Can we automate the process of aligning the simulator with the real world? @huang_peide and others gave an answer by proposing COMPASS🧭.

Peide Huang

@peide_huang

over 2 years ago

🤖What went wrong with my robot simulator? Why did my robot fail miserably in the real world? ☠️ You need COMPASS, a SysID method that automatically discovers the causality between the simulator parameters and the sim2real gap, and updates the parameters to reduce the gap.#CoRL

1

39

8

14

10K

0

6

0

770

over 2 years ago

🚀 Don’t forget DSRL (Datasets for Safe RL)! 📷 With 38 datasets under safety constraint information and consistent API with D4RL, DSRL is your go-to for offline safe learning exploration and research. Check it out at https://t.co/KB3Zh1JWPM now!#Dataset #SafeRL #AISafety (4/4)

0

3

0

179

over 2 years ago

🚀 Excited to announce our latest research on #SafeRL, with three high-quality packages released! We introduce a benchmarking suite tailored for offline safe learning challenges. Explore more details at https://t.co/ZhUsTyd8Hc #AI #RL #AISafety (1/4)

cmusafeai's tweet photo. 🚀 Excited to announce our latest research on #SafeRL, with three high-quality packages released! We introduce a benchmarking suite tailored for offline safe learning challenges. Explore more details at https://t.co/ZhUsTyd8Hc #AI #RL #AISafety (1/4) https://t.co/CB8Cj4RFbh

1

25

6

2

2K