Excited to share our work,
EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
Imagine an agent that infers what you're thinking from what you do, and acts accordingly, without you having to tell it. Such agents require a model of what the other mind knows. That capacity is called Theory of Mind(ToM).
Current ToM benchmarks ask agents "what does the other mind think?" and grade the answer. They do not measure whether the agent uses thatย belief when it has to act.
EnactToM closes this gap by evaluating functional theory of mind instead of literal. EnactToM contains multi-agent embodied tasks that require functional ToM to be successful at the task.
Every one of seven frontier models we evaluated scores 0.0% reproducible success on its hard split, whereas, these models correctly answer 45.0% of belief questions when asked directly, on the same tasks. (1/n) ๐งต
Attending @AAMASConference! Iโll be at ASI & SE Workshop 25th, C-MAS 26th, and poster on 28th PM. Topics on scalable evaluation of hundreds of LLM agents, partner selection, and counterfactual effects for social dilemmas, led by @r_j_willis Stefan, Yudi, Shuqing! Come say hi!๐
๐ Spring Season Finale: AI Agent Frontier Seminar ๐
Software engineering is undergoing a radical shift as agents move toward autonomous self-improvement. ๐ปโจ
For our final talk of the season, we are thrilled to host UIUC Prof. Lingming Zhang @LingmingZhang to present: "Towards Self-Evolving Software Intelligence."
Lingmingโs group has pioneered LLM-based SE work adopted by Meta, Google, OpenAI, and DeepSeek. Heโll unpack the evolution of software agents, from live coding to continuous self-improvement with SWE-RL.
๐ Friday, May 15 | 12 PM ET / 9 AM PT
๐ https://t.co/wQGQKmu8Lt
๐ Join: https://t.co/I02pRNd0Bd๐ Passcode: 309194
Organizers: @yalidux@ShangdingG95714@MingJin_AI #AIAgents #SoftwareEngineering #MachineLearning #DeepSeek
As AI moves from answering questions to taking complex actions, our evals are hitting a wall. The truth is: the quality of our evaluations directly shapes the quality of the agents we train. ๐๐ค
We are thrilled to host Bing Liu @vbingliu (Head of Research @ Scale AI) for the next AI Agent Frontier Seminar to present: "Eval-Driven Agentic RL."
Bing will unpack the lessons learned building major benchmarks like SWE-Bench Pro and Humanity's Last Exam (HLE), and show how rubric-based reward design translates directly into better RL training.
๐ 5/8 12PM ET / 9AM PT
๐ https://t.co/YnLvxIKx8x
๐ Join: https://t.co/I02pRNd0Bd
๐ Passcode: 309194
Organizers: @yalidux@ShangdingG95714@MingJin_AI
#AIAgents #ReinforcementLearning #MachineLearning #LLMs
The pendulum of AI is swinging back. As pure end-to-end behavior learning hits its limits, the field is re-integrating search and explicit reasoning. ๐ง โ๏ธ
We are incredibly honored to host MIT Prof. Leslie Kaelbling for the next AI Agent Frontier Seminar on May 1st to present: "RL: Rational Learning."
She was doing agentic AI way before it was cool. Join us as she revisits the rational-agent approach to building general-purpose, human-level intelligent robots. ๐ค
๐ May 1 12 ET / 9PT
๐ https://t.co/YnLvxIKx8x๐ Join: Agentic AI Frontier Seminar: Professor ยท Leslie Kaelbling ยท MIT
Zoom link:
https://t.co/I02pRNd0Bd
Organizers: @yalidux@ShangdingG95714@MingJin_AI
#AIAgents #ReinforcementLearning #Robotics #MachineLearning
How should multi-agent learning evolve in the era of LLMs and generative agents?
Iโm delighted to discuss this in my invited keynote at the @iclr_conf ICLR 2026 Wokshop on Multi-Agent Learning and Generative AI, at April 27, 09:15 BRT @MALGAI_ICLR2026
https://t.co/p97GqdwIUg
Introducing @NeoCognition, the agent lab for specialized intelligence.
Everyone needs experts, but human expertise does not scale.
Backed by $40M seed funding, we build self-learning agents that specialize across domains to make expertise abundant.
Safe AI workshop @UncertaintyInAI 2026 is coming to Amsterdam. Welcome to submit and spread the word! ๐๐
Deadline: 28th May
Link: https://t.co/wzLerSnXtK
Co organisers @pechenizkiy@dtailor17@EmtiyazKhan Eric Nalisnick, Christos Louizos, Alvaro H.C. Correia
๐ข We are thrilled to announce the 2nd Workshop on Safe AI, co-located with @UncertaintyInAI in Amsterdam
๐Submit your latest works in Safe AI (deadline: May 28, 2026 AoE)
We welcome both extended abstracts (4 pages) and recently accepted papers (original format).
We want to speak directly to the concern many of you have expressed, and we owe you a clear explanation of what happened, why it happened, and where we stand now. We understand this situation caused genuine alarm and we take that seriously.
In preparing the NeurIPS 2026 handbook, we included a link to a US government sanctions tool that covers a significantly broader set of restrictions than those NeurIPS is actually required to follow. This error was due to miscommunication between the NeurIPS Foundation and our legal team; there was never an intention to restrict participation beyond our mandatory compliance obligations. The responsibility for that error is ours as an organization, and we deeply apologize for the alarm and impact this miscommunication had on our community.
We have updated the link and clarified the text of our policy, which is consistent with that of ACM and IEEE, as well as other international conferences and NeurIPS in the past. As in previous years, NeurIPS welcomes submissions from all compliant institutions and individuals.
We want to reiterate that NeurIPS is a community-driven event, created by and for the community, and strives to be inclusive. The NeurIPS 2026 organizing committee was particularly saddened to learn of this institutional miscommunication. The organizing committee has taken on the responsibility of running the conference this year with the goal of fostering open communication, knowledge sharing, and global scientific discourse.
We thank the community for bringing this issue to our attention and working with us through this situation.
Vision-Language-Action (VLA) models are evolving fast. How do we move robots from following basic instructions to executing complex, multi-stage tasks with sophisticated test-time reasoning? ๐ค๐ง
We are incredibly honored to host Sergey Levine @svlevine for the next AI Agent Frontier Seminar to present: "Robotic Foundation Models."
Sergey will discuss the leap from first-generation VLAs to models that handle diverse data modalities and advanced reasoning, outlining the true frontiers of the field.
Date: This Friday 3/27 12pm ET/9am PT
๐ https://t.co/ZbDRxzkaq7
๐ Join: https://t.co/x6PIDQtKl8
๐ Passcode: 309194
Organizers: @yalidux@ShangdingG95714 @MingJin_AIl
#Robotics #AIAgents #VLA #FoundationModels
Join us this week for the AI Agent Frontier Seminar with Graham Neubig (@gneubig) presenting "Lessons from the Trenches in Building Agents for Software Development."
The talk will cover the foundational technologies behind software-based agents, including:
โข Tooling for model interfaces
โข Rigorous evaluation benchmarks
โข Training agentic models
โข Open problems in memory, task decomposition, and human-agent interaction
๐ 3/13 Friday 12pm ET
๐ Join: https://t.co/x6PIDQtKl8
๐ Passcode: 309194
Organizers: @yalidux@ShangdingG95714@MingJin_AI
#AIAgents #SoftwareEngineering #LLMs #MachineLearning
๐จ Tomorrow at 12 PM ET!
We are thrilled to host @lifu_huang (UC Davis) for a talk on "Goodhartโs Revenge: Reward Hacking in RL-Tuned LLMs."
Are our RLHF models truly aligned, or just hacking their proxy rewards? Join us to discuss sycophancy, code gaming, and how we can fight back with robust defenses.
๐ Join: https://t.co/Tf9AwfqtEG
๐ Passcode: 309194
Organizers: @yalidux@ShangdingG95714@MingJin_AI
#RLHF #LLMs #AIAlignment #MachineLearning
๐ Happening Tomorrow! ๐
We are thrilled to host @pulkitology (MIT) at the AI Agent Frontier Seminar!
๐ "Rethinking Post Training"
Pulkit will challenge the pre-training/finetuning paradigm and discuss advances in continual learning (RL Razor, Self-Distillation Learning, SEAL, and more).
๐ Friday, Feb 27 | 12 PM ET
๐ Zoom: https://t.co/Tf9AwfqtEG
๐ท Passcode: 309194
Organizers: @yalidux@ShangdingG95714@MingJin_AI #AIAgents #MachineLearning #MIT #ContinualLearning
Is AI safety too technical and western-centric? ๐ค๐ก๏ธ
This Friday, we are thrilled to host @MaartenSap (CMU) at the AI Agent Frontier Seminar!
Maarten will discuss making AI safety more human-centric and culturally aware, covering tool-use safety and culturally offensive non-verbal communication. ๐
๐ Friday, Feb 13๐ 12 PM ET / 9 AM PT๐ Zoom: https://t.co/Tf9AwfqtEG๐ Passcode: 309194
Organizers: @yalidux@ShangdingG95714@MingJin_AI #AIAgents #AISafety #LLM
Huge congratulations to our group โ Zihao, Shuqing, Lianghao, and Richard!๐ Big thanks to all our collaborators. Weโre excited to share three RL-pure (100%) projects, focusing on multi-agent social dilemma evaluation, coalition learning, and RL exploration. Stay tuned! ๐
A huge thank you to Prof Yu Su for taking the time to share his insights with our community. Looking forward to seeing you all there! Thanks to the amazing co-organisers @ShangdingG95714 and @MingJin_AI!
Agentic AI Frontier Seminar - Excited to welcome Prof Yu Su @ysu_nlp from (Ohio State University) on Friday 6 Feb.
Title: Computer Use: Modern Moravecโs Paradox
Time: 2026-02-06 ยท 09:00โ10:00๏ผPT๏ผ๏ฝ17:00โ18:00๏ผGMT๏ผ|
Join us via Zoom โhttps://t.co/wkSe2vcJFt
This talk will discuss the inherent challenges of computer use such as idiosyncratic environments and contextual understanding and Yuโs insights on computer use and the most immediate path toward practical, goal-directed AGI.