Zihan Wang @Z1hanW - Twitter Profile

Pinned Tweet

6 months ago

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: https://t.co/locrdrxO16 Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

6

347

65

228

49K

Z1hanW retweeted

Junyi Zhang

@junyi42

4 days ago

Children learn from play. Can robots do the same? We propose 𝐏𝐥𝐚𝐲𝐟𝐮𝐥 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐨𝐛𝐨𝐭 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠, a paradigm that gives embodied coding agents a play stage before downstream tasks arrive, and instantiate it with 𝐑𝐀𝐓𝐬 (Robotics Agent Teams), where robots discover reusable skills through curious play. Co-led with @jiaxin_ge_

8

338

63

209

82K

Z1hanW retweeted

Jitendra MALIK

@JitendraMalikCV

5 days ago

We can convert human videos to robot hand-object interaction trajectories in 4D. Enjoy! Paper: https://t.co/AS0ecvTB9I Website: https://t.co/KP4dVqxzUb Code: https://t.co/KfOQTJN8vE Authors:@bhawna_paliwal_,@HarithejaE,@willjhliang, @pabbeel , @notmahi , @JitendraMalikCV

13

760

82

467

57K

Z1hanW retweeted

Jiashun Wang

@JiashunWang

5 days ago

Reference motions are often used as trajectories to track or teachers to distill. We explore a different way of learning from them. I am excited to share our work, Generalizing from References (GfR), to appear at RSS 2026, as a follow-up to our previous HIL work. Using a unified multi-task RL framework, we jointly train reference-guided imitation and goal-driven RL within a single end-to-end policy. No distillation. No RL fine-tuning. Just one policy, trained end-to-end, that learns from references and generalizes beyond them. Rather than treating reference motions as trajectories to track, distill, or follow, we use them to shape behavior while allowing RL to explore and adapt beyond the references. In the following example, without human joystick control, the robot can autonomously compose learned skills using only task goals. 🌐 https://t.co/pdMWBWgtCY 🤖 Things beyond locomotion coming soon.

3

126

23

68

7K

Who to follow

Tianxiang Hu

@TianxiangHuLab

Assistant Professor @GACancerCenter @MCG_AUG, rider @pacelineride, study #TranscriptionRegulation #Hematopoiesis #Leukemia #Immunology #TME #MyeloidCells.

Kan Xu

@kanxu526

Asst Prof @ASU W.P. Carey IS | PhD @Penn Econ

Luke Guerdan

@lukeguerdan

PhD Student at @SCSatCMU | Researching sociotechnical measurement & evaluation of AI systems.

Z1hanW retweeted

Arthur Allshire

@arthurallshire

6 days ago

really excited to have got this out! here's an uncut, real-time video of our policy folding a cardboard box

7

204

16

77

23K

Zihan Wang

@Z1hanW

6 days ago

don’t miss this impressive new project from this amazing team!

Ritvik Singh

@ritvik_singh9

6 days ago

Introducing ABC: open data, training, and infrastructure for robotics. We release the largest teleop dataset to date, and extensively investigate design decisions, pretraining, and post-training techniques. @arthurallshire @Cinnabar233 @adamrasb @redstone_hong @davidrmcall

29

624

100

439

264K

0

13

0

9

3K

Z1hanW retweeted

Haozhe Jiang @erichzjiang

7 days ago

Why aren’t Diffusion Language Model smart yet? Lacking stable post training is a major bottleneck! Meet DiPOD: the tripod for diffusion model post-training. DiPOD boosts accuracy across reasoning tasks, with Sudoku jumping from 22% to 97%, through a one-line code change. 🧵1/5

13

565

74

459

121K

Z1hanW retweeted

Bardienus Duisterhof

@BDuisterhof

11 days ago

Introducing Modality Forcing, a recipe for post-training T2I models for SOTA RGB-Depth generation! Text-to-image (T2I) models learn rich representations of the spatial world. How do we build on this prior for high-quality depth generation? https://t.co/uJjGHNiDBu 🧵 [1/6]

4

169

34

90

62K

Z1hanW retweeted

Jiashun Wang

@JiashunWang

12 days ago

Over the past few years, motion tracking has largely taken over humanoid whole-body control. Most motion tracking methods rely on explicit phase variables or future target poses to track reference motions. But, do we actually need them? We find that task conditions and scene observations alone can already provide enough structure for reference motion tracking. Building on this observation, we introduce HIL: Hybrid Imitation Learning. Using a unified goal-conditioned observation space, we formulate motion tracking and adversarial imitation learning as a single end-to-end multi-task learning problem. This allows a single policy to simultaneously: • track reference motions with high fidelity • compose and adapt skills through adversarial imitation learning By sharing the same observation representation across both tasks, behaviors learned from motion tracking naturally transfer to more general goal-conditioned control. 📄 To appear in ACM Transactions on Graphics (TOG 2026) & SIGGRAPH 2027 🌐 https://t.co/MBb9j1U6Sk 🤖 A real-world humanoid follow-up is coming soon

7

449

60

326

53K

Z1hanW retweeted

Siheng Zhao

@SihengZhao

15 days ago

🪜 What if humanoids could climb ladders and work on them straight out of simulation? Meet LadderMan: a perceptive system for zero-shot sim-to-real ladder climbing and on-ladder manipulation. Watch the humanoid climb, stabilize, and manipulate—all in one system. 🤖👇

17

316

61

120

106K

Z1hanW retweeted

Qitao Zhao @qitao_zhao

20 days ago

I'll be presenting E-RayZer at the VGI workshop (https://t.co/KKKzNoybKJ) as an invited poster (Wed 12:20-13:30, Room 703), and at the main conference poster session as a Highlight paper (Fri 4:00-6:00, ExHall A & F 33). Come chat if you're interested!

qitao_zhao's tweet photo. I'll be presenting E-RayZer at the VGI workshop (https://t.co/KKKzNoybKJ) as an invited poster (Wed 12:20-13:30, Room 703), and at the main conference poster session as a Highlight paper (Fri 4:00-6:00, ExHall A & F 33).

Come chat if you're interested! https://t.co/GMMGNSB1jf

0

11

2

1

1K

Z1hanW retweeted

Xiaoxuan Ma @XiaoxuanMa_

25 days ago

🚀 Excited to share REST3D: REconstructing physically STable and visually consistent 3D scenes from a casual single image🤳. With REST3D, you can naturally interact with stable virtual objects through hand-based VR interactions👐. 🔗 Project page: https://t.co/1CVuGIjAVM

6

557

82

474

39K

Z1hanW retweeted

RuiningLi

@RayLi234

about 1 month ago

🚀 Introducing Articraft, a coding agent for articulated 3D asset creation. Articraft writes code, executes it, receives validation feedback, and refines the result into simulation-ready 3D assets with parts, joints, and motion. We’re also releasing Articraft-10K: 10,000+ articulated objects across 250 categories, unlocking large-scale interactive scenes for robotics simulation and physical AI. 🔗 Project page: https://t.co/FWutv61yx7 💻 Code: https://t.co/CpCYdBzMlv

22

746

108

786

187K

Z1hanW retweeted

Suning Huang

@suning_huang

about 2 months ago

🤖Low-data post-training can teach a VLA policy a new robot skill. But it also makes it too attached to the training demos. We call this lock-in🔒: the policy can execute the post-training task, yet fails to respond to seemingly obvious prompt changes. DeLock preserves steerability using only the policy’s own pretrained knowledge. No extra supervision needed!🚀🚀🚀 #Robotics #AI #EmbodiedAI #VLA

5

178

43

98

32K

Z1hanW retweeted

tingwu.wang

@TingwuWang

about 2 months ago

What is missing to bring real-time motion research into AAA games and real-world robotics? We present MotionBricks, a step toward bridging this gap with two key components: - a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability. - a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors. Webpage: https://t.co/aJE5skUuWD Code: https://t.co/r56D3TJ8CW Paper: https://t.co/CtOHXnHZMv (ACM TOG / SIGGRAPH 2026)

26

1K

150

929

152K

Zihan Wang

@Z1hanW

about 2 months ago

guanya is very insightful and supportive

Guanya Shi

@GuanyaShi

about 2 months ago

My lab at CMU @LeCARLab is hiring a postdoc! (Vibe-made the poster via @ChatGPTapp)

1

197

24

46

25K

0

4

0

295

Z1hanW retweeted

Zhiqiu Lin

@ZhiqiuLin

about 2 months ago

Before AI can generate professional videos, it needs to see like a professional. We spent a year with 100+ content creators teaching AI to describe video like a filmmaker would. Introducing CHAI: Critique-based Human-AI Oversight for Building a Precise Video Language [CVPR'26 Highlight, Top 3%]. Try prompting a video generator for a dolly zoom, dutch angle, point of view, or camera roll. Most fall back to the same bland defaults: a push-in, a level shot, a third-person view. Why? These techniques require a language of cinema that current models rarely speak. We built that language: 1️⃣ Precise specification: 5-aspect structured captions co-designed with professional cinematographers covering subject, scene, motion, spatial, and camera dynamics 2️⃣ Scalable oversight: LLMs draft captions, humans critique what's wrong and how to fix it 3️⃣ Post-training recipes: Qwen3-VL-8B surpasses Gemini-3.1 and GPT-5 4️⃣ Video generation: fine-tuned Wan follows 400-word cinematic prompts with precise control Here's how each works 🧵 Work led by CMU and Harvard with @chancharikm, @du_yilun, and @RamananDeva. 📄 Paper: https://t.co/wCwEtvrntM 🌐 Site: https://t.co/oAAQklGrfF

25

372

63

494

35K

Z1hanW retweeted

Khiem Vuong @kvuongdev

2 months ago

[1/7] Video diffusion has come a long way, generating more & more realistic videos. Can we revisit sparse-view novel view synthesis through these video priors? Meet FrameCrafter: a permutation-invariant multi-view model built on video diffusion 🧵 🌐 https://t.co/ogEN4mkE92

2

150

32

99

10K

Z1hanW retweeted

Angjoo Kanazawa @akanazawa

2 months ago

Very excited to share this work @davidrmcall did with the fantastic NVIDIA Finland team last year. We have a surprisingly simple, but sample efficient way to post-train a flow model with RL.

1

105

12

53

18K

Z1hanW retweeted

Qianqian Wang @QianqianWang5

2 months ago

Most multi-view reconstruction models need full supervision. We show they can self-improve without any ground truth labels. Introducing SelfEvo: Self-Improving 4D Perception via Self-Distillation. Up to +36.5% in video depth, +20.1% in camera estimation, zero annotation.

4

263

36

126

25K

Zihan Wang

@Z1hanW

2 months ago

@AnneLe222 @berkeley_ai Congrats, see you in fall!

0

255

Zihan Wang

@Z1hanW

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users