Researching at the frontier of AI on topics of Computer Vision, Computer Graphics, Robotics, Embodied AI, and Reinforcement Learning @UCSanDiego @haosu_twitr
VLA models often forget their pretrained knowledge during action training, hurting generalization.
🤖Our framework unifies action & VLM training to preserve strong pretrained representations & maintain versatility, boosting generalization & robustness.
https://t.co/Bt66OA51uk
We presented our work on studying multi-embodiment scaling at #CoRL2025 and were thrilled by the excitement around generalist cross-embodiment policies. A common question was: Will you move beyond locomotion? The answer is YES! Locomotion provides a clean starting point, but our long-term goal is to extend these ideas to more challenging domains such as manipulation, paving the way toward general cross-embodiment intelligence.
Huge thanks to the amazing team for making this happen across three time zones: project co-leads @LiuDai_DL, @NicoBohlinger, Dichen Li, together with @tongzhou_mu, @ZhanxinWu0725, K. Fay, and advisors @hiskov, @Jan_R_Peters, @haosu_twitr.
Tongxuan’s work explores using generative models (diffusion) for state estimation and world model learning in cloth manipulation, a domain with significant visual occlusion and complex dynamics. He will give a 5-minute talk at 2:00 pm in the Simulating Robot Worlds workshop today (https://t.co/jw26no1aLB), and a 1-minute spotlight presentation from 3:00–3:30 pm on Sep 28 at #CoRL.
📢 Introducing ManiSkill-HAB: A benchmark for low-level manipulation in home rearrangement tasks!
- GPU-accelerated simulation
- Extensive RL/IL baselines
- Vision-based, whole-body control robot dataset
All open-sourced: https://t.co/nwQNbQT5G8
🧵(1/5)
Learning bimanual, contact-rich robot manipulation policies that generalize over diverse objects has long been a challenge.
Excited to share our work: Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation!
https://t.co/df7t8bJs2m
🧵1/n
SIMPLER will be presented at #CoRL2024 at 4pm on Nov 8 (Section 4)!
While I won't be in person due to visa constraints, @xiao_ted, @KarlPertsch, and @oier_mees will be presenting the paper and are happy to chat about it in person!
🚀 Thrilled to announce the release of the reproduced MeshLRM demo! 🎉 Generate textured 3D meshes from one or more unposed images in seconds. Check it out: https://t.co/u432PE0EOn
While the Segment Anything Model (SAM) greatly improves 2D segmentation annotation efficiency, is there a foundation model that works for 3D point clouds and meshes like SAM?
Introducing Point-SAM, a 3D prompt segmentation foundation model! 👇
https://t.co/0Hf6JBGKMx
Join us at our first workshop on 3D Foundation Models @CVPR2024, June 18 in Summit 434, starting at 8:50AM!
We have fantastic speakers to discuss the progress and prospects in 3D foundation models.
Check out more details at https://t.co/ksTCxYbUP5
🥳Excited to share: Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Joint work with @jyothir_s_v@vlad_is_ai@ylecun@xiaolonw@haosu_twitr
Our method, Puppeteer, learns high-dim humanoid policies that look natural, in an entirely data-driven way!
🧵👇(1/n)
#ICRA2024@LinghaoChen97 will present our differentiable rendering-based hand-eye calibration method, EasyHec!
May 16 13:30@CC-313 (oral); May 16 16:30-18:[email protected] (poster)
It produces accurate calibration results in a fully automatic manner!
https://t.co/iVntWmEkGS
Don’t have a real robot/setup but want to evaluate policies trained on real world datasets? Check out SIMPLER, fast, safe, and reliable evaluation of real robot policies in sim via ManiSkill 2. The ManiSkill 3 beta will port SIMPLER over soon so stay tuned!
Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior?
Presenting SIMPLER!👇
https://t.co/Z9fziHAFl6
LeRobot also features the Diffusion Policy, a powerful imitation learning algorithm, and TDMPC, a reinforcement learning method that includes a world model, continuously learning from its interactions with the environment.
https://t.co/bNrCnOnCBZ
https://t.co/zjeT0mRfRp
📢 ManiSkill 3 beta is out! Simulate everything everywhere all at once 🥯
- 18K RGBD FPS on 1 GPU, 3K on Colab!
- Diverse parallel GPU sim
- Tons of new robots/tasks
All open-sourced: https://t.co/3rCRKP2r1l
Photo: MS3 Tasks w/ scenes from AI2THOR and ReplicaCAD
🧵(1/6)
maniskill sneak peak 3: lots of new robots to use! Whether its mobile manipulation, humanoids, quadrupeds, or even tactile dextrous hands (see the shadow hand at the bottom with red tactile sensors), we have a ton of new domains being added to try out on GPU state/visual sim
Checkout DG-Mesh from @Isabella__Liu, which reconstructs time-consistent, high-quality dynamic mesh with flexible topology change from monocular videos.
https://t.co/oxX70h0lr7
Checkout @SarahWeii 's latest work, which presents a transformer-based model that reconstructs a high-fidelity 3D mesh from 4 (sparse) input images in less than one second!
https://t.co/MvmVCemCuy