NICE AI Talk

@academic_nice

NICE is a non-profit academic community focused on AI. Organized by a team of young researchers, NICE aims to foster open community learning and communication.

New York City

Joined August 2024

198 Following

94 Followers

124 Posts

NICE AI Talk

@academic_nice

9 days ago

As LLMs become increasingly powerful, one key question becomes more urgent: How can we make their behavior more controllable and predictable? Traditional approaches such as retraining or fine-tuning can be costly. Steering offers another path: instead of updating the whole model, we can gently “nudge” its internal activations during inference to influence personality, emotion, safety behavior, and even reasoning patterns. But two fundamental questions remain: 🔹 Why do so many different steering methods work? 🔹 How controllable is steering — and where are its limits? In NICE Talk No.178, we are delighted to invite Ziwen Xu to share insights from two ACL 2026 main conference papers by Zhejiang University and Alibaba. The talk will explore steering from both mechanistic understanding and systematic evaluation, answering: ✨ Why steering works ✨ How far steering can go The talk will also introduce EasyEdit2, a one-stop open-source framework for model editing and steering. 🎙️ Speaker: Ziwen Xu, Master’s Student in Artificial Intelligence, Zhejiang University 👤 Host: Mingyu Jin, PhD Student at Rutgers University ⛳️ Register on Luma: https://t.co/vjAGnRGI6C 👀 Youtube live: https://t.co/7JLHbzj2uX 📅 Time Beijing Time: June 6, 14:00–15:00 Eastern Time: June 6, 02:00–03:00 Pacific Time: June 5, 23:00–June 6, 00:00 #LLM #ZJU #ACL2026 #AI

academic_nice's tweet photo. As LLMs become increasingly powerful, one key question becomes more urgent: How can we make their behavior more controllable and predictable?

Traditional approaches such as retraining or fine-tuning can be costly. Steering offers another path: instead of updating the whole model, we can gently “nudge” its internal activations during inference to influence personality, emotion, safety behavior, and even reasoning patterns.

But two fundamental questions remain:

🔹 Why do so many different steering methods work?

🔹 How controllable is steering — and where are its limits?

In NICE Talk No.178, we are delighted to invite Ziwen Xu to share insights from two ACL 2026 main conference papers by Zhejiang University and Alibaba. The talk will explore steering from both mechanistic understanding and systematic evaluation, answering:

✨ Why steering works

✨ How far steering can go

The talk will also introduce EasyEdit2, a one-stop open-source framework for model editing and steering.

🎙️ Speaker: Ziwen Xu, Master’s Student in Artificial Intelligence, Zhejiang University

👤 Host: Mingyu Jin, PhD Student at Rutgers University

⛳️ Register on Luma: https://t.co/vjAGnRGI6C

👀 Youtube live: https://t.co/7JLHbzj2uX

📅 Time

Beijing Time: June 6, 14:00–15:00

Eastern Time: June 6, 02:00–03:00

Pacific Time: June 5, 23:00–June 6, 00:00

#LLM #ZJU #ACL2026 #AI

134

NICE AI Talk

@academic_nice

15 days ago

"After evaluating 300 agent tasks, we revisited two fundamental questions: what should we evaluate, and how can we make the evaluation trustworthy?" NICE Talk NO.176 We invite Bowen Ye, PhD student at Peking University, to introduce Claw-Eval, an end-to-end evaluation framework for LLM Agents. HOST: Lei Li, PhD Student, The University of Hong Kong TIME: - Beijing Time: May 30, 10:00–11:00 - Eastern Time: May 29, 22:00–23:00 Youtube Live: https://t.co/8fS79A6Azl Register on Luma: https://t.co/7pm91sfrrZ 🤔 As LLMs evolve from chatbots into agents that can use tools and coordinate systems, evaluation is shifting from “Can it answer?” to “Can it reliably get things done?” Claw-Eval includes 300 human-validated tasks across general workflows, multimodal settings, and multi-turn conversations, with systematic evaluation on 14 frontier models. The talk will cover: 🔹 Why final-output-only evaluation is not enough 🔹 How full trajectory auditing reveals hidden failure modes 🔹 What safety and robustness look like inside real workflows 🔹 Key findings from evaluating 14 frontier models 🔹 Open questions on what Agent evaluation should measure nextJoin us to rethink what it means to evaluate agents in the LLM era.

academic_nice's tweet photo. "After evaluating 300 agent tasks, we revisited two fundamental questions: what should we evaluate, and how can we make the evaluation trustworthy?"

NICE Talk NO.176 We invite Bowen Ye, PhD student at Peking University, to introduce Claw-Eval, an end-to-end evaluation framework for LLM Agents.

HOST:
Lei Li, PhD Student, The University of Hong Kong

TIME:
- Beijing Time: May 30, 10:00–11:00
- Eastern Time: May 29, 22:00–23:00

Youtube Live: https://t.co/8fS79A6Azl
Register on Luma: https://t.co/7pm91sfrrZ

🤔 As LLMs evolve from chatbots into agents that can use tools and coordinate systems, evaluation is shifting from “Can it answer?” to “Can it reliably get things done?”

Claw-Eval includes 300 human-validated tasks across general workflows, multimodal settings, and multi-turn conversations, with systematic evaluation on 14 frontier models.
The talk will cover:
🔹 Why final-output-only evaluation is not enough
🔹 How full trajectory auditing reveals hidden failure modes
🔹 What safety and robustness look like inside real workflows
🔹 Key findings from evaluating 14 frontier models
🔹 Open questions on what Agent evaluation should measure nextJoin us to rethink what it means to evaluate agents in the LLM era.

NICE AI Talk

@academic_nice

18 days ago

NICE Talk NO.175 invites the Guanlin Dong to share a new paradigm for general agent training: environment-agent co-evolution. Paper: Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence arXiv： https://t.co/g5NOhmOZ3Y Project website: https://t.co/QGV3MTAWZ9 Speaker: Guanlin Dong, PhD student, Renmin University of China Host: Boyang Xue, PhD student The Chinese University of Hong Kong ⛳️ Register on Luma: https://t.co/zU4KheknEe 👀 Live on Youtube: https://t.co/TAREBmTjZ8 Time Beijing: May 26, 20:00–21:00 Eastern USA: May 26, 08:00–09:00 Pacific Time: May 26, 05:00–06:00 Introduction： 💡 Most current agent training approaches treat environments as static benchmarks. Agent-World explores a different direction: agents and environments evolving together. The framework tightly couples: • Autonomous environment exploration from the real world • Continuous self-evolution training through multi-environment RL • Automatic diagnosis of capability weaknesses • Targeted expansion of environments and tasks 🚀 Key results: • 1,978 interactive environments • 19,822 executable tools • Long-horizon tasks with 15+ interaction turns on average • Strong performance across 23 challenging benchmarks, including τ²-Bench, BFCL V4, MCP-Mark, ClawEval, and SkillsBench #LLM #Agents #AgenticAI #ReinforcementLearning #OpenSource #AGI

academic_nice's tweet photo. NICE Talk NO.175 invites the Guanlin Dong to share a new paradigm for general agent training: environment-agent co-evolution.

Paper: Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence arXiv： https://t.co/g5NOhmOZ3Y
Project website: https://t.co/QGV3MTAWZ9

Speaker: Guanlin Dong, PhD student, Renmin University of China
Host: Boyang Xue, PhD student The Chinese University of Hong Kong
⛳️ Register on Luma: https://t.co/zU4KheknEe
👀 Live on Youtube: https://t.co/TAREBmTjZ8

Time
Beijing: May 26, 20:00–21:00
Eastern USA: May 26, 08:00–09:00
Pacific Time: May 26, 05:00–06:00

Introduction：

💡 Most current agent training approaches treat environments as static benchmarks. Agent-World explores a different direction: agents and environments evolving together. The framework tightly couples:
• Autonomous environment exploration from the real world
• Continuous self-evolution training through multi-environment RL
• Automatic diagnosis of capability weaknesses
• Targeted expansion of environments and tasks

🚀 Key results:
• 1,978 interactive environments
• 19,822 executable tools
• Long-horizon tasks with 15+ interaction turns on average
• Strong performance across 23 challenging benchmarks, including τ²-Bench, BFCL V4, MCP-Mark, ClawEval, and SkillsBench

#LLM #Agents #AgenticAI #ReinforcementLearning #OpenSource #AGI

NICE AI Talk

@academic_nice

28 days ago

NICE Talk 173 invites🎙️Heyuan Huang (Algorithm Engineer from OPPO Research Institute) to share the design and vision of TopoClaw. Talk Time ⏰ EST: 5.16 22:00~23:00 😊 Register on Luma: https://t.co/IVkXIkKoYD 📌 Watch live on YouTube:https://t.co/4Y56D9UXJa 💻 With the rapid growth of AI, user perception of AI assistants has shifted from "What is this?" to "What else can it do?", witnessing their powerful execution on PCs. Yet, current assistants still can't operate phones, lacking multi-dimensional perception and proactive collaboration. 🤝 A truly "productive" AI needs "fingers" to tap screens, "tentacles" to perceive events, and social skills to complete tasks for you. 🦀 TopoClaw is born for this: built from scratch to bridge PC and mobile, drive workflows automatically, and collaborate under your identity—not giving advice, but taking action. Work will be related in the talk: 🌟TopoClaw Open-Source Project 🎙️Host: Wenyue Hua, Senior Researcher at Microsoft Research #AI #LLM #AIAgents #TopoClaw #OpenSource

academic_nice's tweet photo. NICE Talk 173 invites🎙️Heyuan Huang (Algorithm Engineer from OPPO Research Institute) to share the design and vision of TopoClaw.

Talk Time ⏰ EST: 5.16 22:00~23:00
😊 Register on Luma: https://t.co/IVkXIkKoYD
📌 Watch live on YouTube:https://t.co/4Y56D9UXJa

💻 With the rapid growth of AI, user perception of AI assistants has shifted from "What is this?" to "What else can it do?", witnessing their powerful execution on PCs. Yet, current assistants still can't operate phones, lacking multi-dimensional perception and proactive collaboration.
🤝 A truly "productive" AI needs "fingers" to tap screens, "tentacles" to perceive events, and social skills to complete tasks for you.
🦀 TopoClaw is born for this: built from scratch to bridge PC and mobile, drive workflows automatically, and collaborate under your identity—not giving advice, but taking action.

Work will be related in the talk:
🌟TopoClaw Open-Source Project
🎙️Host: Wenyue Hua, Senior Researcher at Microsoft Research

#AI #LLM #AIAgents #TopoClaw #OpenSource

NICE AI Talk

@academic_nice

about 1 month ago

NICE Talk 172 invites🎙️Bingxiang He @HBX_hbx, PhD student at Tsinghua University @TsinghuaNLP, to share Three Frontiers of Scalable RL for LLMs. Talk Time ⏰ EST: 5.15 22:00~23:00 📌 Watch live on YouTube: https://t.co/BRFR10esLu 😊 Register: https://t.co/hhj9awhrF1 🤠Can RL advance model capabilities without any supervised signals? 🧐Three Frontiers, One Map: Charting the Feasible Region of Scalable RL Matters More Than Inventing Another Trick 😈 Explicit length penalties and more lenient verifiers both led to significant performance degradation. 😈 Switching to a higher-scoring teacher can paradoxically shrink—or even reverse—student gains. Work will related in the talk: 🌟JustRL: https://t.co/IaOhCCz0px 🌟Unsupervised RLVR: https://t.co/zORhAagpEq 🌟Rethinking On-Policy Distillation: https://t.co/D0FhVDwFK5 🎙️Host: Cheng Qian, PhD at UIUC #AI #LLM #scalinglaw #model #ReinforcementLearning

academic_nice's tweet photo. NICE Talk 172 invites🎙️Bingxiang He @HBX_hbx, PhD student at Tsinghua University @TsinghuaNLP, to share Three Frontiers of Scalable RL for LLMs.

Talk Time ⏰ EST: 5.15 22:00~23:00

📌 Watch live on YouTube: https://t.co/BRFR10esLu
😊 Register: https://t.co/hhj9awhrF1

🤠Can RL advance model capabilities without any supervised signals?
🧐Three Frontiers, One Map: Charting the Feasible Region of Scalable RL Matters More Than Inventing Another Trick

😈 Explicit length penalties and more lenient verifiers both led to significant performance degradation.
😈 Switching to a higher-scoring teacher can paradoxically shrink—or even reverse—student gains.

Work will related in the talk:
🌟JustRL: https://t.co/IaOhCCz0px
🌟Unsupervised RLVR: https://t.co/zORhAagpEq
🌟Rethinking On-Policy Distillation: https://t.co/D0FhVDwFK5

🎙️Host: Cheng Qian, PhD at UIUC

#AI #LLM #scalinglaw #model #ReinforcementLearning

955

NICE AI Talk

@academic_nice

about 1 month ago

NICE Talk No. 169 invites Hengyuan Zhang (👤 homepage: https://t.co/boLmgUg2QT), first-year Ph.D. student at HKU Ngai Lab, to share on LLM Interpretability: From Mechanism to Model Improvement. Talk Time ⏰ PDT 05.08 20:00–21:00 EDT 05.08 23:00–24:00 📌 Watch live: YouTube: https://t.co/8pcdfxKsNC 😊 Register: https://t.co/qIxvlzUgGd Key highlights: 🤔 Why does a model produce a certain behavior? Where are capabilities stored — in which layers, modules, or representations? 🧠 [Locate, Steer, and Improve] — A practical survey of actionable mechanistic interpretability in LLMs, organized as a Locate → Steer → Improve pipeline. ⚙️ NSDS — A data-free layer-wise mixed-precision quantization method driven by numerical and structural dual-sensitivity, guided by interpretability analysis. 🌐 ShifCon — Enhances non-dominant language capabilities via shift-based contrastive learning on multilingual representation subspaces. 💡 Core insight: Interpretability is not just about "seeing" how a model works — it can be a tool for improving it. Paper 1: https://t.co/DoVHbXs9E5 Paper 2: https://t.co/ImAv9bimbm Paper 3: https://t.co/ol8HdnZtFf #AI #LLM #Interpretability #MechanisticInterpretability #PhDLife #AIResearch #HKU #NICE

academic_nice's tweet photo. NICE Talk No. 169 invites Hengyuan Zhang (👤 homepage: https://t.co/boLmgUg2QT), first-year Ph.D. student at HKU Ngai Lab, to share on LLM Interpretability: From Mechanism to Model Improvement.

Talk Time ⏰
PDT 05.08 20:00–21:00
EDT 05.08 23:00–24:00

📌 Watch live: YouTube: https://t.co/8pcdfxKsNC
😊 Register: https://t.co/qIxvlzUgGd

Key highlights:
🤔 Why does a model produce a certain behavior? Where are capabilities stored — in which layers, modules, or representations?
🧠 [Locate, Steer, and Improve] — A practical survey of actionable mechanistic interpretability in LLMs, organized as a Locate → Steer → Improve pipeline.
⚙️ NSDS — A data-free layer-wise mixed-precision quantization method driven by numerical and structural dual-sensitivity, guided by interpretability analysis.
🌐 ShifCon — Enhances non-dominant language capabilities via shift-based contrastive learning on multilingual representation subspaces.
💡 Core insight: Interpretability is not just about "seeing" how a model works — it can be a tool for improving it.

Paper 1: https://t.co/DoVHbXs9E5
Paper 2: https://t.co/ImAv9bimbm
Paper 3: https://t.co/ol8HdnZtFf

#AI #LLM #Interpretability #MechanisticInterpretability #PhDLife #AIResearch #HKU #NICE

NICE AI Talk

@academic_nice

about 2 months ago

NICE AI Talk No. 165🤩 Inviting Jian Yang to explore the frontier of industrial code: Can AI truly learn to "think" like a hardware engineer? 🤔 Time: PDT 2026.04.18 (Saturday) 18:30–19:30 | EDT 21:30–22:30 Register to watch live: Luma Event 📩 https://t.co/qhgMS3uY79 The InCoder-32B series tackles modern industrial code from chip design to GPU optimization by introducing the first unified foundation model purpose-built for these high-stakes environments. By combining large-scale industrial code pretraining with real-world validation tools, it establishes a new open-source baseline for serious engineering tasks. Moving beyond simple code generation. Jian's team built an Industrial Code World Model (ICWM) with 96.7% prediction accuracy, refining reasoning through Error-Driven Chains of Thought (ECoT). The result is a system that dynamically adapts its reasoning depth—from concise fixes to long-form, multi-step debugging traces (91 to 19K tokens)—achieving 81.3% on LiveCodeBench. 📽️Guest Profile: Jian Yang, Ph.D. and Assistant Professor at Beihang University. He has published 100+ publications among ICLR, NeurIPS, ACL, EMNLP etc top-tier venues, and served as a Senior Area Chair and Senior Program Committee member for NeurIPS, Association for Computational Linguistics, and Association for the Advancement of Artificial Intelligence (AAAI). His work bridges the gap between high-level LLM reasoning and the rigid constraints of real-world "cold code" like Verilog and CUDA. Paper: https://t.co/kMWFZReaVv https://t.co/ZzYqepP5VV Huggingface: https://t.co/G6IBDfbkvV #AI #SoftwareEngineering #IndustrialCode #LLMs #ChipDesign #Hardware #NICEAITalk #Academic

104

NICE AI Talk

@academic_nice

about 2 months ago

NICE PODCAST🤩Inviting Researcher Wenyue Hua @HuaWenyue31539 to step beyond pure tech: How does she leverage interdisciplinary thinking to tackle Agent adoption's toughest challenges? 🤔 Time: EST 2026.04.17 (Friday) 22:00-23:00 Register to watch live: https://t.co/OZFL669TV1 Finding the optimal balance between models, tools & budget. Building "insured" #trust standards for Agents, inspired by financial risk control. 🛡️ 👩‍🎓From UCLA (Math + Philosophy) → Linguistics to CS PhD → Microsoft Research. Her winding path seeks universal solutions across disciplines. 🧠In an era of tech acceleration, what can a humanities lens bring to cold code? 📺watch live: https://t.co/RFPKqE6CX6 The relevant work will be mentioned in the podcast🥳 🌟AgentOpt Homepage: https://t.co/5Je9TpDf6g Github: https://t.co/WNHahfoQIo Blog: https://t.co/cj9VAFsXwE 🌟Quantifying Trust Title: Quantifying Trust: Financial Risk Management for Trustworthy AI Agents arXiv: https://t.co/DIAsiupufT Github: https://t.co/JhGFYdBJQl Host: Enyu Zhou, PhD at Fudan University #AI #Agents #Interdisciplinary #podcast #industry #academic

academic_nice's tweet photo. NICE PODCAST🤩Inviting Researcher Wenyue Hua @HuaWenyue31539 to step beyond pure tech: How does she leverage interdisciplinary thinking to tackle Agent adoption's toughest challenges? 🤔

Time: EST 2026.04.17 (Friday) 22:00-23:00
Register to watch live: https://t.co/OZFL669TV1

Finding the optimal balance between models, tools & budget. Building "insured" #trust standards for Agents, inspired by financial risk control. 🛡️

👩‍🎓From UCLA (Math + Philosophy) → Linguistics to CS PhD → Microsoft Research. Her winding path seeks universal solutions across disciplines.
🧠In an era of tech acceleration, what can a humanities lens bring to cold code?

📺watch live: https://t.co/RFPKqE6CX6

The relevant work will be mentioned in the podcast🥳
🌟AgentOpt
Homepage: https://t.co/5Je9TpDf6g
Github: https://t.co/WNHahfoQIo
Blog: https://t.co/cj9VAFsXwE

🌟Quantifying Trust
Title: Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
arXiv: https://t.co/DIAsiupufT
Github: https://t.co/JhGFYdBJQl

Host: Enyu Zhou, PhD at Fudan University
#AI #Agents #Interdisciplinary #podcast #industry #academic

NICE AI Talk

@academic_nice

2 months ago

NICE TALK 157 🥳 invites Dr. Xiaoxuan Wang, PhD at UCLA, to talk about a unified framework for stable agentic reinforcement learning. Talk Time⏰EST 4.3⏰21:30~22:30 📌Watch live: https://t.co/uZ9WsGX3l5 📌Register on Luma: https://t.co/PPcHFhoBks ⭐️They proposed one analytical framework ARLArena, and conducted an in-depth analysis across four key dimensions: Loss Aggregation, Importance Sampling (IS) Clipping, Trajectory Filtering, and Advantage Design. 🤖 One unified RL method, SAMPO, which integrates three core mechanisms: 1⃣sequence-level clipping to ensure baseline stability 2⃣fine-grained advantage signals (turn-level advantages) to improve credit assignment 3⃣dynamic trajectory filtering to further enhance training data quality. paper: https://t.co/N806UQGFH9 github: https://t.co/h1LYBNn7jd #AI #agent #LLM #generative #RL #reasoning

academic_nice's tweet photo. NICE TALK 157 🥳 invites Dr. Xiaoxuan Wang, PhD at UCLA, to talk about a unified framework for stable agentic reinforcement learning.

Talk Time⏰EST 4.3⏰21:30~22:30
📌Watch live: https://t.co/uZ9WsGX3l5
📌Register on Luma: https://t.co/PPcHFhoBks

⭐️They proposed one analytical framework ARLArena, and conducted an in-depth analysis across four key dimensions: Loss Aggregation, Importance Sampling (IS) Clipping, Trajectory Filtering, and Advantage Design.

🤖 One unified RL method, SAMPO, which integrates three core mechanisms:
1⃣sequence-level clipping to ensure baseline stability
2⃣fine-grained advantage signals (turn-level advantages) to improve credit assignment
3⃣dynamic trajectory filtering to further enhance training data quality.

paper: https://t.co/N806UQGFH9
github: https://t.co/h1LYBNn7jd

#AI #agent #LLM #generative #RL #reasoning

233

NICE AI Talk

@academic_nice

2 months ago

NICE Talk 158🌟 invites Yinjie Wang (👤 homepage: https://t.co/grbOE8TPdc), Ph.D. student at the University of Chicago, to share insights on OpenClaw-RL — an agent that improves the more you use it. Talk Time ⏰ PDT 04.04 18:30–19:30 EDT 04.04 21:30–22:30 📌 Watch live: YouTube: https://t.co/WkhsMPHePL 😊 Register: https://t.co/JmMLc7jGn5 Key highlights: 🧠 What if your model could learn and evolve from every interaction after deployment? ⚙️ OpenClaw-RL is a novel RL framework — deploy your model on it, and it automatically and continuously self-improves through real-world usage. 🚀 Combines GRPO + On-policy Distillation, turning the entire history of model-user-environment interactions into powerful RL training signals. 🤖 The result: personal agents that don't stay static — they grow smarter and more adaptive the more they are used. 🔍 Validated through creative experiments demonstrating efficient self-optimization for personal agents. paper: https://t.co/aP04iuI7Wu #AI #Agent #LLM #RL #ReinforcementLearning #SelfEvolving #OpenClawRL #GRPO #PhDLife #AIResearch

academic_nice's tweet photo. NICE Talk 158🌟 invites Yinjie Wang (👤 homepage: https://t.co/grbOE8TPdc), Ph.D. student at the University of Chicago, to share insights on OpenClaw-RL — an agent that improves the more you use it.

Talk Time ⏰
PDT 04.04 18:30–19:30
EDT 04.04 21:30–22:30

📌 Watch live: YouTube: https://t.co/WkhsMPHePL
😊 Register: https://t.co/JmMLc7jGn5

Key highlights:
🧠 What if your model could learn and evolve from every interaction after deployment?
⚙️ OpenClaw-RL is a novel RL framework — deploy your model on it, and it automatically and continuously self-improves through real-world usage.
🚀 Combines GRPO + On-policy Distillation, turning the entire history of model-user-environment interactions into powerful RL training signals.
🤖 The result: personal agents that don't stay static — they grow smarter and more adaptive the more they are used.
🔍 Validated through creative experiments demonstrating efficient self-optimization for personal agents.

paper: https://t.co/aP04iuI7Wu

#AI #Agent #LLM #RL #ReinforcementLearning #SelfEvolving #OpenClawRL #GRPO #PhDLife #AIResearch

NICE AI Talk

@academic_nice

2 months ago

NICE Talk 156 🌟 invites Dr. Yifu Qiu @yifuqiu98, PhD candidate at the University of Edinburgh, jointly supervised at Cambridge University. 🥳We will talk about models' self-improving world modelling via latent actions! Talk Time⏰EST 4.3⏰9:00~10:00 📌Watch live: https://t.co/SrdKP8JeUU 📌Register on Luma: https://t.co/IsExhI1GYB In the internal world modeling process of VLMs and LLMs, we often face these challenges: 🙃Difficulty in unified modeling across diverse modalities 🤖Limited interpretability of latent actions between states 💡Challenges in acquiring accurate action annotation data 📈Difficult to autonomously complete learning through rollouts The ability to internally model the world is essential: predicting next states from current states and actions. SWIRL🍥, a self-improving framework that treats actions as latent variables and alternates optimization between a forward world model and an inverse dynamics model to learn solely from state sequences. SWIRL🍥 achieves cross-modal SOTA, improving AURORA-BENCH by 16%, ByteMorph by 28%, WORLD-PREDICTION-BENCH by 16%, and STABLETOOL-BENCH by 14%. paper: https://t.co/LPvqatoa99 github: https://t.co/9OsueZ2ngR #worldmodel #multimodal #AI #agent #LLM #generative #RL #reasoning

academic_nice's tweet photo. NICE Talk 156 🌟 invites Dr. Yifu Qiu @yifuqiu98, PhD candidate at the University of Edinburgh, jointly supervised at Cambridge University. 🥳We will talk about models' self-improving world modelling via latent actions!

Talk Time⏰EST 4.3⏰9:00~10:00
📌Watch live: https://t.co/SrdKP8JeUU
📌Register on Luma: https://t.co/IsExhI1GYB

In the internal world modeling process of VLMs and LLMs, we often face these challenges:
🙃Difficulty in unified modeling across diverse modalities
🤖Limited interpretability of latent actions between states
💡Challenges in acquiring accurate action annotation data
📈Difficult to autonomously complete learning through rollouts

The ability to internally model the world is essential: predicting next states from current states and actions.

SWIRL🍥, a self-improving framework that treats actions as latent variables and alternates optimization between a forward world model and an inverse dynamics model to learn solely from state sequences.

SWIRL🍥 achieves cross-modal SOTA, improving AURORA-BENCH by 16%, ByteMorph by 28%, WORLD-PREDICTION-BENCH by 16%, and STABLETOOL-BENCH by 14%.

paper: https://t.co/LPvqatoa99
github: https://t.co/9OsueZ2ngR

#worldmodel #multimodal #AI #agent #LLM #generative #RL #reasoning

638

NICE AI Talk

@academic_nice

3 months ago

NICE Talk 154🌟 invites Wei Fu (👤 homepage: https://t.co/pcYh5DOwVd ), Ph.D. student at Tsinghua University, to share insights on RL infrastructure for next-generation AI systems. Talk Time ⏰ PST 03.28 06:00–07:00 EST 03.28 09:00–10:00 📌 Watch live: YouTube: https://t.co/xbOmevK4Yt 😊 Register: https://t.co/wPXVOSlZrg Key highlights: 🧠 Reinforcement Learning (RL) has become a central focus in the LLM community, and Agentic RL is rapidly shaping the 2026 landscape. ⚙️ This talk dives into RL infrastructure through a deep-dive of AReaL 1.0, a large-scale asynchronous RL system. 🚀 AReaL enables zero-code integration for online RL training—simply connect via base URL and API key to train and evolve agent applications. 🤖 Supports a wide range of agent use cases, including emerging frameworks like OpenClaw. 🔍 Discussion on challenges and opportunities in RL Infra, and how AReaL is evolving in the era of agent-driven AI. paper: https://t.co/VwylslPAuj #AI #agent #LLM #RL #ReinforcementLearning #GenerativeAI #Infrastructure

NICE AI Talk

@academic_nice

3 months ago

NICE Talk 153🌟 invites Peng Yu, a PhD student at SJTU, to discuss Structured In-context Environments (SIE) for enhancing the model's reasoning environment. Talk Time⏰PST 3.27 20:00~21:00 📌 Watch live: https://t.co/C6R3zcxwNr Register on Luma: https://t.co/m3TgggpTFH 🙃Traditional mathematical or coding environments rely heavily on expensive expert annotations, while the skills learned in game-like environments are difficult to generalize. 🧐 An ideal LLM Reasoning training environment must simultaneously possess three core features: Scalability, support for Generalizable Reasoning, and Verifiability. ☺️The SIE framework proposes to automatically construct an inference environment from massive structured data, such as knowledge graphs. paper: https://t.co/Y5Jj8xcO56 #AI #agent #LLM #generative #RL #reasoning

academic_nice's tweet photo. NICE Talk 153🌟 invites Peng Yu, a PhD student at SJTU, to discuss Structured In-context Environments (SIE) for enhancing the model's reasoning environment.

Talk Time⏰PST 3.27 20:00~21:00
📌 Watch live: https://t.co/C6R3zcxwNr
Register on Luma: https://t.co/m3TgggpTFH

🙃Traditional mathematical or coding environments rely heavily on expensive expert annotations, while the skills learned in game-like environments are difficult to generalize.
🧐 An ideal LLM Reasoning training environment must simultaneously possess three core features: Scalability, support for Generalizable Reasoning, and Verifiability.
☺️The SIE framework proposes to automatically construct an inference environment from massive structured data, such as knowledge graphs.

paper: https://t.co/Y5Jj8xcO56
#AI #agent #LLM #generative #RL #reasoning

165

NICE AI Talk

@academic_nice

3 months ago

NICE Talk 151🌟 invites Jundong Xu @nigualjiadapubu, a PhD student at NUS, to discuss step-level logical validation to prevent correct answers from flawed reasoning. Talk Time⏰PST 3.22 20:00~21:00 📌 Watch live: https://t.co/dWq9HBAbjY Register on Luma: https://t.co/2BIp8CsXSU Key findings: 🧐LogicReward trains LLMs using step-level logical validation to prevent correct answers from flawed reasoning. It combines autoformalization with soft unification and theorem-prover checks to ensure the faithfulness of reasoning. 🤠This approach improves performance across benchmarks and generalizability to unseen tasks. paper: https://t.co/36oaBEVQkO #AI #agent #LLM #generative #RL #reasoning

academic_nice's tweet photo. NICE Talk 151🌟 invites Jundong Xu @nigualjiadapubu, a PhD student at NUS, to discuss step-level logical validation to prevent correct answers from flawed reasoning.

Talk Time⏰PST 3.22 20:00~21:00

📌 Watch live: https://t.co/dWq9HBAbjY
Register on Luma: https://t.co/2BIp8CsXSU

Key findings:
🧐LogicReward trains LLMs using step-level logical validation to prevent correct answers from flawed reasoning. It combines autoformalization with soft unification and theorem-prover checks to ensure the faithfulness of reasoning.
🤠This approach improves performance across benchmarks and generalizability to unseen tasks.

paper: https://t.co/36oaBEVQkO
#AI #agent #LLM #generative #RL #reasoning

224

NICE AI Talk

@academic_nice

3 months ago

NICE Talk 150🌟 invites Yufan Zhuang @yufan_zhuang, a PhD student at UCSD, to discuss LLM self-improving during test-time. Talk Time⏰EST 3.22 21:00~22:00 📌 Watch live: https://t.co/S0gr0kG4tH Register on Luma: https://t.co/CtXt3ztQm2 Key findings are counterintuitive: 🧐The Test-time Recursive Thinking (TRT) framework enables LLMs to self-improve reasoning through iterative knowledge accumulation, combining strategic rollout generation, self-verification-based solution selection, and contrastive failure analysis without external supervision. 😎TRT achieves significant accuracy gains: open-source models reach 100% on AIME benchmarks, while closed-source models improve by 10.4～14.8 percentage points on LiveCodeBench’s hardest problems through self-generated test execution and adaptive exploration strategies. paper: https://t.co/aaxQxmmvPu #AI #agent #LLM #generative #scaling

academic_nice's tweet photo. NICE Talk 150🌟 invites Yufan Zhuang @yufan_zhuang, a PhD student at UCSD, to discuss LLM self-improving during test-time.

Talk Time⏰EST 3.22 21:00~22:00

📌 Watch live: https://t.co/S0gr0kG4tH
Register on Luma: https://t.co/CtXt3ztQm2

Key findings are counterintuitive:
🧐The Test-time Recursive Thinking (TRT) framework enables LLMs to self-improve reasoning through iterative knowledge accumulation, combining strategic rollout generation, self-verification-based solution selection, and contrastive failure analysis without external supervision.
😎TRT achieves significant accuracy gains: open-source models reach 100% on AIME benchmarks, while closed-source models improve by 10.4～14.8 percentage points on LiveCodeBench’s hardest problems through self-generated test execution and adaptive exploration strategies.

paper: https://t.co/aaxQxmmvPu
#AI #agent #LLM #generative #scaling

445

NICE AI Talk

@academic_nice

3 months ago

NICE Talk 148🌟 invites @emilianopp_, a PhD student at Mila-Quebec & Université de Montréal, to discuss how LLMs can learn from privileged information during training — without needing it at test time. 📖 Paper: Privileged Information Distillation for Language Models — [https://t.co/tbNaa4YxqQ] ⏰ Time: 3.20 (Fri) 9:00 PM - 10:00 PM EDT 3.20 (Fri) 6:00 PM - 7:00 PM PDT 📌 Register: https://t.co/YidJGQwy4R 📌 Watch live: https://t.co/fdQWOyb4tB ✨This talk is hosted by @Haolun_Wu0203, Ph.D. at Mila & McGill What if your model could train with a "cheat sheet" — but still ace the test without it? Emiliano presents Privileged Information Distillation, a unified post-training framework that bridges the gap between hinted training and non-privileged inference. ⭐ Key findings: 🧐 Privileged information during training significantly boosts LLM performance — but design choices matter enormously for generalization; 🤠 A variational framework + on-policy distillation outperforms strong baselines including SFT + GRPO; 🤪 Most surprisingly, not all privileged information is equal — the right hints incentivize generalization, while the wrong ones don't. #AI #LLM #PrivilegedInformation #Distillation #PostTraining #Reasoning #NICE #NexusForIntelligence

academic_nice's tweet photo. NICE Talk 148🌟 invites @emilianopp_, a PhD student at Mila-Quebec & Université de Montréal, to discuss how LLMs can learn from privileged information during training — without needing it at test time.
📖 Paper: Privileged Information Distillation for Language Models — [https://t.co/tbNaa4YxqQ]

⏰ Time:
3.20 (Fri) 9:00 PM - 10:00 PM EDT
3.20 (Fri) 6:00 PM - 7:00 PM PDT
📌 Register: https://t.co/YidJGQwy4R
📌 Watch live: https://t.co/fdQWOyb4tB

✨This talk is hosted by @Haolun_Wu0203, Ph.D. at Mila & McGill

What if your model could train with a "cheat sheet" — but still ace the test without it? Emiliano presents Privileged Information Distillation, a unified post-training framework that bridges the gap between hinted training and non-privileged inference.
⭐ Key findings:
🧐 Privileged information during training significantly boosts LLM performance — but design choices matter enormously for generalization;
🤠 A variational framework + on-policy distillation outperforms strong baselines including SFT + GRPO;
🤪 Most surprisingly, not all privileged information is equal — the right hints incentivize generalization, while the wrong ones don't.

#AI #LLM #PrivilegedInformation #Distillation #PostTraining #Reasoning #NICE #NexusForIntelligence

NICE AI Talk

@academic_nice

3 months ago

NICE Talk 145🌟 invites Xuan Liu @XuanLiu888, a PhD student at UCSD, to discuss whether AI can truly reproduce human behavior through HumanStudy-Bench. Time ⏰PST 3.14 19:00~20:00 ⏰EST 3.14 22:00~23:00 📌 Watch live: YouTube livestream: https://t.co/g6WZ3K4Jsv Register on Luma: https://t.co/hokwVCpjQa Key findings are counterintuitive: 🧐model scale ≠ more human-like: the same model with different agent designs varied by 35%+; 🤠no universal optimal template: each model has its own "best recipe"; 🤪Most surprisingly, telling it "you are a human" sometimes makes it less human, not more. Related Work: - CoBRA: Programming Cognitive Bias in Social Agents Using Classic Social Science Experiments — ACM CHI 2026 🏆 Best Paper Award —[https://t.co/14McOUmZWK] - CogMir: Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View — ICLR 2025 —[https://t.co/IVWhqky74W] #AI #agent #LLM #human #benchmark #generative

academic_nice's tweet photo. NICE Talk 145🌟 invites Xuan Liu @XuanLiu888, a PhD student at UCSD, to discuss whether AI can truly reproduce human behavior through HumanStudy-Bench.

Time
⏰PST 3.14 19:00~20:00
⏰EST 3.14 22:00~23:00

📌 Watch live:
YouTube livestream: https://t.co/g6WZ3K4Jsv
Register on Luma: https://t.co/hokwVCpjQa

Key findings are counterintuitive:
🧐model scale ≠ more human-like: the same model with different agent designs varied by 35%+;
🤠no universal optimal template: each model has its own "best recipe";
🤪Most surprisingly, telling it "you are a human" sometimes makes it less human, not more.

Related Work:
- CoBRA: Programming Cognitive Bias in Social Agents Using Classic Social Science Experiments — ACM CHI 2026 🏆 Best Paper Award —[https://t.co/14McOUmZWK]
- CogMir: Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View — ICLR 2025 —[https://t.co/IVWhqky74W]
#AI #agent #LLM #human #benchmark #generative

355

NICE AI Talk

@academic_nice

Last Seen Users on Sotwe

Trends for you

Most Popular Users