Proud to see Co-Me accepted to CVPR 2026 🎉
Now supporting MapAnything 1.1, Depth Anything 3, and Pi3 - and 2× faster than the original, up to 21.5× speedup on long VGGT sequences. Congrats to the team!
All your favorite 3D models — now faster with Co-Me.
��� Accepted to CVPR 2026, Co-Me now supports more 3D foundation models: MapAnything 1.1, Depth Anything 3, and Pi3.
Same simple confidence-guided token merging idea — now accelerating even more 3D reasoning models. 👇
I’m excited to share that I successfully defended my Ph.D. thesis on Specification-Driven Planning for Safe Autonomy!
I’m deeply grateful to my committee Sebastian Scherer, Changliu Liu, Karen Leung and Eunsuk Kang for their time and guidance throughout this journey. More in 🧵
Meet KinDER — a stress test for robot physical reasoning. All 13 methods failed 😈
🌎 25 environments
♾️ Infinite tasks
🏋️ Gymnasium API
⚒️ Over 20 parameterized skills
🪧 Human demonstrations
📊 13 baselines (planning and learning)
From @Princeton@CMU_Robotics@ICatGT@CambridgeMLG@nvidia@MIT_CSAIL
🧵 1/n
I'm excited to share that RAVEN was accepted to ICRA 2026!
Paper: https://t.co/aAJOuO1uRc
Website: https://t.co/2XY4bEEZlz
Collaboration with @OmarAlama, Dmytro Kurdydyk, John Keller, @Nik__V__ , Wenshan Wang, @ybisk , @smash0190
See you in Vienna!
#IROS2026 will convene in Pittsburgh from Sept 27 – Oct 1!
As one of the largest & most dynamic robotics conferences, IROS brings together world-leading researchers, educators, govt. leaders, startups, industry innovators, practitioners, & investors👇
https://t.co/zWIb6SZtCO
Fast & light 2D and 3D zero-shot open-vocabulary semantic segmentation is here 🚀🪶!!
Meet RADSeg:
- 6-30% mIoU improvement while being 3.95x faster and using 2.5x fewer parameters.
- Outperforms combinations of huge vision models (850-1350M) with just 105M !
💡The key is building on the agglomerative model, RADIO, and improving spatial consistency.
We are honored to share that Super Odometry is now published in @ScienceRobotics and featured as a highlight article! 🚀 This work rethinks the SLAM paradigm: true resilience should not rely solely on external perception—it should begin from within.
https://t.co/wl5DApyBww #SLAM
Human peripheral vision reduces detail in out-of-focus areas. This “annoying” feature saves massive computation while preserving spatial cues. And for the most human-like artifact we build—ROBOT—that efficiency matters.
Checkout our recent work:👉🔗 https://t.co/8V6QOhCNSe
🧵[3/n]
Co-Me distills a tiny confidence predictor that identifies low-confidence regions before most layers even run, letting us merge those tokens and cut redundant compute.
✨That’s it — simple and effective.
🧵[2/n]
We noticed the model burns most of its compute on uncertain regions that are later discarded by downstream tasks. Can we avoid wasting this computation?
More and more visual-geometric transformers are coming out, like VGGT and MapAnything—but pushing them to real robot is still challenging.
What if we could make them 10× faster?
👉🔗https://t.co/YWwovcyVHD
⚡Co-Me speeds up VGGT and MapAnything by up to 11.3x and 7.2x.
How? 👇🧵
Robots can plan, but rarely improvise. How do we move beyond pick-and-place to multi-object, improvisational manipulation without giving up completeness guarantees?
We introduce Shortcut Learning for Abstract Planning (SLAP), a new method that uses reinforcement learning (RL) to discover shortcuts in the planning graphs induced by task and motion planning (TAMP) skill libraries. It is a plug-and-play module that can be trained on top of existing planners to speed up execution through learned shortcuts.
(1/5)
⛔️Stop throwing away far range semantics, encode them as Rays instead !
🔥Excited to present RayFronts at #IROS2025 in Hangzhou, China !
🎥Catch us in the live presentation next Tuesday 16:45-16:50 Track 9.
Last year, I came across the idea of constrained decoding (I know, late to the party) and was fascinated. The ability to enforce constraints for LLMs at inference time without fine tuning is a powerful idea. It got me thinking, can we do this for robot foundation models?
1/n🧵
We introduce RAVEN, a 3D open-set memory-based behavior tree framework for aerial outdoor semantic navigation. RAVEN not only navigates reliably toward detected targets, but also performs long-range semantic reasoning and LVLM-guided informed search
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀
One universal model enables SoTA for:
🔥 Mono Depth Estimation
🔥 Multi-View SfM
🔥 Multi-View Stereo
🔥 Depth Completion
🔥 Registration
… and many more possibilities! – plus everything is metric 🎯
We release code for data processing, training, benchmarking & ablations – everything Apache 2.0!
Details & Links 👇
🚨CMU Vision-Language-Autonomy update: The team released a video to "find the refrigerator in the lounge"–– they are looking for new PhD & Master's students to work on long-horizon navigation & instruction!
Contact Ji Zhang for more information: https://t.co/U0yfBtoAU9
Thrilled that @NVIDIA_Robotics selected us among the first to test the new NV platform! 🙌 Huge thanks to NVIDIA and Jensen Huang for the generous gift of a #JetsonThor Dev Kit to @CMUAirLab.
We’ve already run #MACVO on Thor at high resolution while keeping real-time performance
Want to learn how to empower 🤖 with real-time scene understanding and exploration capabilities?
Catch Me, @hocherie1 & @QiuYuhengQiu presenting RayFronts at #RSS2025 SemRob Workshop (OHE 122) & Epstein Plaza at 10:00 am PST Today!
Catch our team @Parvkpr@PatrikarJay@AirLabCMU presenting and demoing ViSafe at #RSS2025 tomorrow!
We'll be showing our payload demo & high speed aerial collision avoidance results 🚀