My first last-author paper, with @lakshitadodeja and @rai_inst collaborators!
When deploying Behavior Cloning (BC) policies, we encounter performance gaps due to compounding error, subtle task shifts, etc.
We want to be able to post-train BC policies with a few online interactions to quickly improve performance.
Q2RL is what we hoped for in a robot post-training algorithm:
- ✅it starts with BC policy rollouts (no need for large, diverse datasets), and
- ✅it's sample-efficient enough for real-world online reinforcement learning!
See the thread and linked website for more details. Paper and code are released!
I’m attending #ICLR2026 in Rio this week to present LPWM! Friday April 24 Poster Session 3 10:30, Oral Session 4B 3:15p. Happy to chat about self-sup object-centric learning and world models. I’ll be on the job market soon and looking for exciting opportunities!
#ICLR@iclr_conf
I’m so tired of writing rebuttals to this kind of “lack of novelty” review: “This paper trivially combines A, B, and C, so the algorithmic novelty is limited.”
Technically, most (if not all) robotics papers are convex combinations of existing ideas.
I still deeply appreciate A+B+C papers—especially when they deliver:
- New capabilities: the “trivial combination” unlocks behaviors we simply couldn’t achieve before
- Sensible & organic design: A+B+C is clearly the right composition—not some arbitrary A′+B+C′
- Nontrivial interactions: careful analysis of the dynamics, coupling, or failure modes between A, B, C
- Rehabilitating old ideas: A was dismissed for years, but paired with modern B/C, it suddenly works—and teaches us why
- System-level & "interface" insight: the contribution is not any single piece, but how the pieces talk to each other
- Scaling laws or regimes: identifying when/why A+B+C works (and when it doesn’t)
- Engineering clarity: making something actually work robustly in the real world is not “trivial”
- New problem formulations: sometimes the real novelty is in the reformulation—only under this view does A+B+C make sense.
Maybe worth keeping these in mind when reviewing the next A+B+C paper : )
🚀 #ICLR2026 Oral 💥
How can we design world models that capture object interactions directly from pixels?
Introducing Latent Particle World Models-the first end-to-end self-supervised, object-centric world model, trained from videos, supporting action/img/lang conditioning.
1/n
@vincesitzmann Clarification: you call a point cloud an "intermediate representation" but what about 3D sensors that directly record point clouds? Should we not develop methods that can learn from such data?
@xwang_lk@jon_barron@TuliMathieu So "natural" just means "recorded using sensors similar to the sensors that humans have"? What about sonar (bats)? And why couldn't some alien species have a lidar sensor built in, making it "natural"?
I am filling out a fellowship application letter of recommendation, and one question is: "Applicant's achievements reflect his or her ability: * - select -Not at All; Well; Moderately Well; Very Well; Extremely Well".
If I select "Well" then what does that imply?