🎉 Excited to share that our paper "DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions" is accepted to #IROS2024 🤖
📄 Paper link: https://t.co/9Qiu2y3NxQ
🔗 Website: https://t.co/Xgxtq3p9dj
💻 Code: https://t.co/3ZOSPyEMSW
🧵👇
🚀 I built a small vibe-coded pet project that compiles natural language prompts directly into executable ELF binaries! No compiler, no manual coding—just pure prompt-to-binary.
Why is this exciting? The future implications are massive:
The possibilities are endless. This is just a toy work exploring what's possible when you bypass the traditional software stack and go straight from intent to execution.
Check out the live demo here: https://t.co/vlNTclJ0wy
#BuildInPublic#Robotics#EmbeddedSystems#TechVibes
🚀 I built a small vibe-coded pet project that compiles natural language prompts directly into executable ELF binaries! No compiler, no manual coding—just pure prompt-to-binary.
Why is this exciting? The future implications are massive:
2️⃣ Self-Adaptive, Hyper-User-Friendly Hardware: Consumer devices that change their entire functionality based on environmental sensors or internal wear-and-tear. Your hardware rewrites its own executable logic dynamically to optimize itself. 📱⚡
1️⃣ Space Exploration Robots: Imagine a rover facing unexpected hardware damage or needing a completely new function millions of miles away. It can autonomously assess the situation, generate an executable binary on the fly via LLMs, and adapt instantly to survive the mission.
I’m biased of course, but I’m particularly pleased with the depth <-> RGB bijection we came up with for monocular depth estimation, which arose from thinking about space filling curves and the power transform work that fell out of Zip-NeRF. And how cool is this figure?
In Fall 2026, I will begin a tenure-track faculty position @JHUCompSci
Announcing the SciPhy lab, where we will study the science of physical agents (robots)
We are now recruiting our first cohort of PhD students. If this is you, see
https://t.co/heSKsCbWz7
Our #NeurIPS2025 *spotlight* paper #SegMASt3R establishes image segment matching as a benchmark task & enables high performance downstream on 3D Instance Mapping & Object-Relative Navigation
https://t.co/asDjl4w8xD
Huge effort by @_rjayanti@Swayam_421@vanshg1729@RRCLab_IIITH
Egocentric human videos offer a rich source to tap for advancing mobile manipulation 🚀.
Led by @LawrenceZhu22 , we’re excited to introduce EMMA - our next step beyond Egomimic, taking us from tabletop setups to real-world mobile manipulation.
More details in below thread 👇
Thanks for featuring our work @RealityLabs@meta_aria@AIatMeta.
We show that embodied human data is a scalable data source for robot learning. Check out the full Meta AI blog post covering our story -https://t.co/FxZTFHKmFB
At the recently concluded International Conference on Intelligent Robots and Systems (IROS 2024) that was held in Abu Dhabi, IIITH’s Robotics Research Centre made a splash with 6 research papers. Here’s a brief roundup of some of that cutting-edge work. https://t.co/dT0SB9wFV5
🚀Excited to share SceneComplete: an open-world 3D scene completion system for constructing a complete, segmented 3D model of a scene from a single RGB-D image.🖼️🤖
SceneComplete enables dexterous grasping and robust robot manipulation in highly cluttered scenes - a short 🧵
Introducing DART: breaking the barriers for robotic data collection by enabling anyone, anywhere in the world to control robots without even having a robot.
Just log into https://t.co/GfzTEq5zol to contribute and control robots at much lower fatigue and higher speed than real-world teleoperation.
⚡️No need for resets or setting environments
⚡️ Support for multiple robots
⚡️ Robot bootstraps and autonomously collects data in simulation while you sleep!
This is the first step towards a fully crowdsourced and open-source foundation model for robotics.
Work led by @younghyo_park@JagdeepBhatia8@larsankile
We started this moonshot project a year ago. Now we are excited to share our progress on robot learning from egocentric human data 🕶️🤲
Key idea: Egocentric human data is robot data in disguise. By bridging the kinematic, visual, and distributional gap, we can directly leverage human data to scale up imitation learning.
My thoughts:
1. Human data is robot data: We perceive with our eyes and act with our body and hands. If we can learn across robot embodiment, why not humans?
2. Data collection needs to be passive: Just as the Internet evolved into an unintentional data repository for AI, we envision systems that effortlessly capture embodied experiences from human activities, without humans’ conscious participation.
3. The data capturing technology is ready: Project Aria glasses by @RealityLabs@meta_aria capture all the information we need to turn human data into robot data: fisheye camera, 3D hand tracks, SLAM, … We will see more ubiquitous devices entering the consumer market.
4. The next generation foundation model will learn from embodied human data: The human sensorimotor experience is largely missing from today’s AI training data. Next-gen foundation models will understand humans from a human’s perspective, driven by large-scale embodied human data.
See @simar_kareer's thread for more details about this project.
We figured out a way to solve long-horizon planning problem by composing a bunch of modular diffusion models in a factor graph!
This allows us to reuse the diffusion models in unseen new tasks and achieve zero-shot generalization to multi-robot collaborative manipulation tasks.
This is another solid step in our effort to Task and Motion Planning as a fully generative problem. More to come & stay tuned.