Can MLLMs actually track what's happening in a video?
Introducing VSTAT ๐ฏ, our new benchmark for visual state tracking.
The tasks are simple: count cups, read typed words, count page flips. Humans solve them easily. MLLMs don't.
https://t.co/dgqhqeVuSv
๐งต [1/11]
A new RLHF vulnerability identified ๐จ
RLHF can be exploited to optimize misaligned biases, such as ideological or promotional biases.
We introduce Alignment Tampering, a vulnerability where the LLM undergoing alignment influences the preference dataset itself, causing RLHF to amplify undesired behaviors.
๐ป Paper & Code: https://t.co/nQqspXqL1V
#ICML2026 #AIAlignment
@KAIST_AI, @MIT_CSAIL
1/N ๐งต
What if your retriever could speak every language your data speaks? ๐
Your answer might live in a document ๐, a SQL table ๐๏ธ, an RDF knowledge graph ๐, or a property graph ๐ธ๏ธ, and OmniRetrieval reaches into all of them, meeting each source in its own native query language instead of flattening everything into one lossy space.
Paper: https://t.co/dI6IvBwfWW
Excited to introduce ๐งโ๐๐๐ฒ๐ฎ๐ฟ๐ป ๐ณ๐ฟ๐ผ๐บ ๐ช๐ฒ๐ฎ๐ธ๐ป๐ฒ๐๐๐ฒ๐ (LearnWeak)!
A framework that automatically specializes small CUAs for specific domains by ๐ฏ๐๐ฎ๐ฟ๐ด๐ฒ๐๐ถ๐ป๐ด ๐๐ต๐ฒ๐ถ๐ฟ ๐ผ๐๐ป ๐ณ๐ฎ๐ถ๐น๐๐ฟ๐ฒ ๐ฝ๐ฎ๐๐๐ฒ๐ฟ๐ป๐ in data generation and training.
๐งต(1/7)
๐ Releasing โจAXPOโจ an RL method to lift agentic reasoning models past their next scaling tier.
Be it math, perception, or search, AXPO fixes the structural blind spot 'just add tools' recipes leave untouched.
8B beats 4x larger 32B baseline on Pass@4.
from NVIDIA ๐งต (1/7)
Introducing TRQAM! Internalizing a KL trust region inside the sampling SDE stabilizes off-policy RL fine-tuning of pretrained flow policies. With TRQAM, we lift offline RL success on 50 OGBench tasks from 46% to 68%. ๐งต [1/8]
https://t.co/vRYvY4GnDA
๐จNew Optimizer Paper
AMUSE: Anytime MUon with Stable gradient Evaluation
AMUSE combines Muon with Schedule-Free-style gradient evaluation for stable anytime training without LR decay.
โข Stronger 124M / 720M / 1B pretraining
โข Strong ImageNet / ViT fine-tuning performance.
We are looking for talented people interested in AI for Science, including ML for molecules, materials, and scientific discovery.
If you are interested, please feel free to DM or email me. I am happy to chat and answer any questions.
@KAIST_AI Our department currently consists of 25 world-class "young" faculty members (in addition to 43 affiliated and 8 invited/adjunct) and is looking for new members to join one of the fastest-growing AI research communities in the world.
https://t.co/V0z4g8sPhy
๐ KAIST AI is recruiting faculty members in Seoul!๐
Planning to attend ICML? Join us there and help shape a brighter future of AI๐
https://t.co/XZEbzfzohC
๐ข New preprint out on contextual integrity (CI) and a new Product-of-Experts (PoE) view of self-distillation!
Introducing SelfCI, a novel self-distillation framework that operationalizes CI by optimizing for the intersection of task utility and minimal disclosure.
๐งต๐
Our work shows that using reasoning models as evaluators improves evaluation quality with additional test-time compute, enabling stronger re-ranking of #lanugagemodel outputs & matching the gains of increased compute at generation time. Learn how: https://t.co/ecJPPjkyBf #NECLabs
Diffusion models fail at multi-object generation โ but why? ๐ค
In our #ICML2026 paper, we built MOSAIC, a controlled framework to diagnose these failures.
Spoiler: it's not mainly data imbalance. Scene complexity and missing compositions in training matter much more! โจ
(1/n)
Can LLM agents build memory before seeing any user task?
Memory is usually built from human tasks or deployment interactions. New tool environments often have neither, creating cold-start gap.
Introducing PREPING: building agent memory without tasks.
https://t.co/bTV24GP4qc
LLM memory systems can store facts.
They can't reason about what changes when one of those facts updates.
We tested 6 systems across 3 paradigms. All collapse on dependency reasoning: Cascade 3%, Absence 1%.
๐ MEME: Multi-entity & Evolving Memory Evaluation ๐งต 1/n
๐ข Diffusion-based LLM paper accepted to #ICML2026 ๐ฅณ
Diffusion LLMs promise parallel & bidirectional generation, but fully non-autoregressive decoding still struggles in practice.
We analyzed why NAR fails, and show how minimal interventions can substantially improve it!
Happy to share our #ICML2026 paper!
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging
https://t.co/9heSoWWRes
Massive SFT as one joint run? -> We split, train, merge.
No sync, matches or beats joint training.
See you in Seoul!
@KAIST_AI#NAVERCloud
Secure your discount to our event gathering scientists, engineers, and researchers from academia and industry, discussing embodied intelligence, physical AI and the future of intelligent machines. https://t.co/9zThnPCvOf
@KAIST_AI@NatureComms@Nature@NatureElectron
Excited to share that AgentFlow has been selected as an ICLR 2026 Oral ๐
https://t.co/a0KygGDYhN
Since launch, AgentFlow has also grown to 1.7K GitHub stars. Thank you so much for the support.
AgentFlow is a trainable multi-agent system where specialized agents learn to plan and use tools in the flow of a task. We are excited to present it at ICLR.
๐ ๏ธ Code: https://t.co/XFvTyJt3WZ
๐ค Models: https://t.co/3IfV4rB9Be
๐ Demo: https://t.co/6RDKYW2368
๐ฅ Video: https://t.co/AecTGQkpS1
Huge shoutout to the amazing team behind this work:
๐ @zhuofengli96475, @GhxIsaac, @SeungjuHan3,
@ShengLiu_, @jianwen_xie, @yuz9yuz,
@YejinChoinka, @james_y_zou
And thank you to our supporters:
๐ท @LambdaAPI, @RenPhilanthropy, @StanfordHAI, @StanfordAILab, @kaist_ai.
See you at ICLR 2026!
#ICLR2026 #AgentFlow #AgenticAI #LLM #RL #ToolUse
๐ Our paper " Sommelier : A Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models" is accepted at #ACL2026 industry track!
We have introduced a pipeline for generating the real-world speech data necessary to build full-duplex audio language models.