📣Thrilled that "Rethinking Dataset Distillation: Hard Truths about Soft Labels" is a Best Paper Finalist @CVPR (Top 15 out of 16,000+ submissions)! 🏆
Congrats to the authors! 🥳🙌🏻
Join us today:
🎙️ Oral: 10:15 @ Mile High Ballroom 1A-2A
📍Poster 18 @ ExHall A
We are presenting WFM-Eval at two @CVPR 2026 workshops in Denver 📍
🗓️ Jun 3, Video World Models
Poster 9:50–10:40 AM, Exhibit Hall A
🗓️ Jun 4, Foundation Models Meet Embodied Agents
Poster 3:55–4:30 PM
Come say hi 👋
Work done with @AmberZhang99@prithvijitch@judyfhoffman
Can LLMs adapt continually without losing base skills?
Fast-Slow Training (FST) pairs "slow" weights with "fast" context.
FST vs. RL:
• 3x more sample-efficient
• Higher performance ceiling
• Less KL drift (better plasticity)
• Continual learning: succeeds where RL stalls
Coding has moved up a level of abstraction. Everyone knows it. Karpathy called it in 2023: "The hottest new programming language is English." By 2025 he named it vibe coding. By 2026 it's just... coding.
When C compiled to assembly, we stopped reading assembly. We built tools at the C level — debuggers, profilers, type checkers — that let us reason at the abstraction we actually operate at. The assembly still existed. We just stopped being the ones who needed to read it.
That's exactly where we are with AI-generated code. You describe intent in natural language, the agent writes code, you approve it. The code still matters. But you're no longer the one writing it, so reading it line-by-line is the wrong use of your cognitive load. Your attention is better spent on intent, architecture, and behavior — the level you actually work at now.
I kept running into this personally. I use Cursor daily. The agent makes a bunch of edits across files and I approve them. I mostly don't check the details. It works, and that's fine. But it's uncomfortable not knowing what the code is actually doing. You're either reading raw diffs (too slow, too low-level for how you're working now) or just trusting and moving on (fine until it isn't).
I wanted something in between. Not a full code review, not blind trust — just enough to stay informed. A quick visual summary of what each file does, what changed, and why.
So I put together Glassbox — as in, the opposite of a black box. Your codebase doesn't have to be opaque just because you didn't write it. It's a small Cursor rule + hook that automatically generates a visual companion file (.vis.md) for every code file the agent touches. No text walls. Just mermaid diagrams:
Structure — classes, functions, relationships. Changed components are annotated directly on the diagram
Flow — control flow with branches and decisions. You see the logic, not the syntax
Dependencies — what this file imports and why, as a graph
Change timeline — what was added, modified, or removed, rendered chronologically
Every diagram renders in VS Code's markdown preview. 30 seconds scanning diagrams instead of 10 minutes reading code. The .vis/ directory mirrors your source tree and commits to git, so your team sees the same visual docs.
Nothing groundbreaking — it's just a rule file and a cursor hook. But it's been genuinely useful for me. The endgame is fully agentic zero-shot: you describe what you want, the agent builds and ships it, and code becomes truly invisible. We'll get there. But right now we're in the messy middle — agents are good enough to write your code but not yet reliable enough to ship unsupervised. You still need to understand what's happening. You just shouldn't have to read source code to do it.
https://t.co/4KiIP6ubTO
Sarvam Vision is built on a strong foundation of general image understanding, but sharpened with a singular focus: True Document Digitisation.
It understands the visual world, but parses documents with exceptional precision—mastering complex charts, tables, english and all 22 scheduled Indian languages.
It is finally out in the wild. Excited for the community to try it, test its limits and explore its capabilities.
🚀 New paper: https://t.co/ei1nic0fqZ
VideoLMs are bottlenecked by a simple problem: they treat video like a stack of images. That means huge token costs, slow responses, and missed temporal details.
What if we processed video the way codecs do? 🎬
Instead of dense per-frame RGB embeddings, we tokenize motion vectors + residuals and only encode sparse keyframes — turning video redundancy into a powerful inductive bias for efficient temporal reasoning.
🧵👇
Current LLMs support contexts with millions of tokens. However, we keep seeing failure modes due to poor long-context reasoning.
Our new work shows that, for long contexts, we must perform test-time training updates rather than vanilla ICL or “thinking”!
w/ @Meta & @KempnerInst
A little late but just in time post for #NeurIPS ⏰
📢 At @NeurIPSConf 2025, we will be presenting "GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer" 🎉
We introduce a training-free appearance transfer pipeline robust to strong geometric variations between 3D objects.
📝 arXiv: https://t.co/eUxv7khHHH
A thread 🧵
1/
🚀 Code release for Kontinuous Kontext!
💫 Looking for finer, fine-grained control in instruction-driven image editing?
⚙️Our method lets you smoothly adjust edit strength for more precise, intuitive edits!
Check it out - https://t.co/0WwFuSRJqH
✨ I’ll be presenting our work on depth-aware image editing at @ICCVConference in Hawaii 🌴 next week!
📅 Oct 22 | 📍 Exhibit Hall I | 🧩 Poster #82
🌍 Project: https://t.co/mtrmBgyqMK
🤝 Working on image generation or editing? I’d love to chat at ICCV!
@val_iisc
“Make it red.”
“No! More red!”
“Ughh… slightly less red.”
“Perfect!” ♥️
🎚️Kontinuous Kontext adds slider-based control over edit strength to instruction-based image editing, enabling smooth, continuous transformations!
Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks where strategic exploration is necessary. We introduce a framework for training a policy over sets of generations and use it to induce exploration.
Work with @ifdita_hasan (co-lead), @ellenjxu_ , @chelseabfinn and @DorsaSadigh at Stanford 🧵
Calling all digital artists 🧑🎨 Have you ever forgotten to put objects on separate layers?
Introducing InkLayer, a segmentation algorithm that makes scene sketches easy to edit.
I’m presenting at #SIGGRAPH2025 on Wednesday, 11:45–11:55 am, West Building 118–120. See you there!
New fastest shortest-path algorithm in 41 years!
Tsinghua researchers broke Dijkstra’s 1984 “sorting barrier,” achieving O(m log^(2/3) n) time. This means faster route planning, less traffic, cheaper deliveries, and more efficient networks - and a CS curriculum revamp =)
🧵 17/18
I strongly encourage the community to post more critical blogs as a more effective “open review”.
Especially now, as traditional conference reviews increasingly lose credibility, robust and transparent community feedback is crucial for advancing science—not just AI—toward healthier and more rigorous standards.