Pengcheng Wang

Verified account

@Outsider_pc

Ph.D. student @berkeley_ai | Cur./ Prev. Intern @meta @amazon Working on Transferable and Scalable RL for Robotics. Do elegant research trivial in hindsight.

Berkeley, California

Joined February 2025

81 Following

96 Followers

30 Posts

Pinned Tweet

1 day ago

Real-time Chunking (RTC) is designed to enable smooth asynchronous execution of flow-matching policies. However, it has some critical limitations: its inpainting-based async execution capability comes from inference-time corrections rather than the base policy, yielding little pre-training benefit, specific fine-tuning for better performance (e.g. training-time RTC), heuristic guidance, and extra computation that inflates the latency. In this work, we observe that discrete diffusion policies, which generate actions by iteratively unmasking, are natural asynchronous executors that resolve all limitations at once, being simpler to implement, faster at inference, and better at execution. Paper: https://t.co/SbpNCQMBF1 Code: https://t.co/fQyT6NR2eE Website: https://t.co/jLiCogvxoa

7

71

12

59

35K

about 8 hours ago

@TX_Leo_Wang 🥰

0

0

0

0

47

Outsider_pc retweeted

Chen Tang @ChenTangMark

1 day ago

RTC is a key ingredient for deploying high-latency VLA policies in real-time. We show that discrete diffusion is a more natural fit for asynchronous execution: with no extra implementation or specialized fine-tuning, it achieve strong performance on dynamic manipulation tasks!

0

50

6

56

8K

1 day ago

Huge thanks to the amazing collaborators: Kaiwen Hong,, @chensheng_peng, Katherine Driggs-Campbell, Masayoshi Tomizuka, @Chenfeng_X, @ChenTangMark!

0

3

0

0

237

1 day ago

Real-time Chunking (RTC) is designed to enable smooth asynchronous execution of flow-matching policies. However, it has some critical limitations: its inpainting-based async execution capability comes from inference-time corrections rather than the base policy, yielding little pre-training benefit, specific fine-tuning for better performance (e.g. training-time RTC), heuristic guidance, and extra computation that inflates the latency. In this work, we observe that discrete diffusion policies, which generate actions by iteratively unmasking, are natural asynchronous executors that resolve all limitations at once, being simpler to implement, faster at inference, and better at execution. Paper: https://t.co/SbpNCQMBF1 Code: https://t.co/fQyT6NR2eE Website: https://t.co/jLiCogvxoa

7

71

12

59

35K

1 day ago

(5) In real-world dynamic manipulation tasks that Sync baselines completely fail , DiscreteRTC outperforms ContinuousRTC with a huge gap on the Hockey Defend and the Dynamic Pick where the reactiveness is critical for success. Moreover, ContinuousRTC inflates flow-matching cost with a near-1.7× overhead from the ΠGDM, whereas DiscreteRTC reduces discrete diffusion cost to around 0.7×. Even compared with training-time ContinuousRTC with extra fine-tuning efforts, DiscreteRTC can still outperform with higher action qualities and success rates.

Outsider_pc's tweet photo. (5) In real-world dynamic manipulation tasks that Sync baselines completely fail , DiscreteRTC outperforms ContinuousRTC with a huge gap on the Hockey Defend and the Dynamic Pick where the reactiveness is critical for success. Moreover, ContinuousRTC inflates flow-matching cost with a near-1.7× overhead from the ΠGDM, whereas DiscreteRTC reduces discrete diffusion cost to around 0.7×. Even compared with training-time ContinuousRTC with extra fine-tuning efforts, DiscreteRTC can still outperform with higher action qualities and success rates.

0

3

0

0

151

1 day ago

(4) DiscreteRTC requires fewer iterative steps than ContinuousRTC with fewer tokens to unmask during inpainting. Moreover, DiscreteRTC can outperform Training-time ContinuousRTC , improves with more steps, and the backbone policy can be seamlessly combined with advanced inference-time methods like VLASH for further gains.

Outsider_pc's tweet photo. (4) DiscreteRTC requires fewer iterative steps than ContinuousRTC with fewer tokens to unmask during inpainting. Moreover, DiscreteRTC can outperform Training-time ContinuousRTC , improves with more steps, and the backbone policy can be seamlessly combined with advanced inference-time methods like VLASH for further gains.

0

2

0

0

170

1 day ago

(3) Under different inference delays, DiscreteRTC consistently outperforms ContinuousRTC and other variants on both solve rates and throughputs, showing the advantage of the native inpainting capability in discrete diffusion policies.

Outsider_pc's tweet photo. (3) Under different inference delays, DiscreteRTC consistently outperforms ContinuousRTC and other variants on both solve rates and throughputs, showing the advantage of the native inpainting capability in discrete diffusion policies. https://t.co/P9kDmplo5u

0

2

0

0

186

1 day ago

(2) In contrast, we show that discrete diffusion policies can naturally resolve all the aforementioned limitations at once. (a) Inpainting as Pre-training. Discrete diffusion policies are pre-trained to inpaint upon randomly masked sequences. Therefore, scaling pre-training directly improves asynchronous performance, and the native forward pass suits inference-time inpainting; (b) Fine-tuning Free. As a consequence, inpainting-specific patterns are implicitly introduced during pre-training, making discrete diffusion a fine-tuning-free approach for high-quality, out-of-the-box asynchronous execution; (c) Natural Guidance. Moreover, with discrete diffusion policies, we can early-exit inference once the necessary action tokens are unmasked, leaving the remaining masking pattern as an adaptive and natural guidance for the next inference; (d) Lower Inference Cost. Finally, with committed tokens from previous chunks, the tokens to unmask per inference are reduced, leading to lower inference cost for inpainting.

Outsider_pc's tweet photo. (2) In contrast, we show that discrete diffusion policies can naturally resolve all the aforementioned limitations at once.

(a) Inpainting as Pre-training. Discrete diffusion policies are pre-trained to inpaint upon randomly masked sequences. Therefore, scaling pre-training directly improves asynchronous performance, and the native forward pass suits inference-time inpainting;

(b) Fine-tuning Free. As a consequence, inpainting-specific patterns are implicitly introduced during pre-training, making discrete diffusion a fine-tuning-free approach for high-quality, out-of-the-box asynchronous execution;

(c) Natural Guidance. Moreover, with discrete diffusion policies, we can early-exit inference once the necessary action tokens are unmasked, leaving the remaining masking pattern as an adaptive and natural guidance for the next inference;

(d) Lower Inference Cost. Finally, with committed tokens from previous chunks, the tokens to unmask per inference are reduced, leading to lower inference cost for inpainting.

0

3

0

0

266

1 day ago

(1) We identify 4 limitations to apply RTC with flow-matching policies: (a) Pre-training w/o Inpainting. Flow-matching policies are not pre-trained with inconsistent noise for inpainting. Therefore, scaling pre-training does not directly improve asynchronous performance, and inference-time corrections are inevitable for inpainting; (b) Fine-tuning Required. As a consequence, adequate inpainting quality demands a dedicated fine-tuning stage with techniques such as action-suffix conditioning to explicitly introduce the inpainting-specific noise pattern into training; (c) Heuristic Guidance. Moreover, to better leverage the previous action chunks, ΠGDM requires a heuristic schedule fixed across different inference-time cases; (d) Extra Inference Cost. Finally, the correction guidance term at every denoising step roughly doubles inference cost, ironically increasing the very latency RTC aims to hide.

Outsider_pc's tweet photo. (1) We identify 4 limitations to apply RTC with flow-matching policies:

(a) Pre-training w/o Inpainting. Flow-matching policies are not pre-trained with inconsistent noise for inpainting. Therefore, scaling pre-training does not directly improve asynchronous performance, and inference-time corrections are inevitable for inpainting;

(b) Fine-tuning Required. As a consequence, adequate inpainting quality demands a dedicated fine-tuning stage with techniques such as action-suffix conditioning to explicitly introduce the inpainting-specific noise pattern into training;

(c) Heuristic Guidance. Moreover, to better leverage the previous action chunks, ΠGDM requires a heuristic schedule fixed across different inference-time cases;

(d) Extra Inference Cost. Finally, the correction guidance term at every denoising step roughly doubles inference cost, ironically increasing the very latency RTC aims to hide.

0

5

1

1

484

Outsider_pc retweeted

23 days ago

Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: https://t.co/ItuO5zc6hT and paper: https://t.co/fmz2irYIm1 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn

5

216

35

99

59K

24 days ago

Excited to share that three of my papers — DADP, REAR, and Mind Your Entropy — have been accepted to ICML 2026! Especially happy that DADP received reviewer scores of 5/5/5/4😊, whose cute idea is also recongnized by the reviewers and the community. Truly grateful to all my collaborators for the insightful discussions, hard work, and support. Looking forward to continuing to work on more fun and exciting projects together. DADP: https://t.co/Q6ITqOEOtO REAR: https://t.co/yOziuGzSxP Mind your Entropy: https://t.co/Og91Lc4awR

2

17

0

5

983

Outsider_pc retweeted

about 2 months ago

🤔The more I studied diffusion language models, the more I came to appreciate the simplicity of autoregressive (AR) language models. AR models are trained to agree with what they generate, and their serving stacks are built to preserve that structure. DLMs often do neither: they lack introspective consistency, and high TPF does not necessarily translate into high real-world TPS. We propose Introspective Diffusion Language Model (I-DLM), which unifies introspection and generation in a single pass: 1. 🧑‍🎓I-DLM brings introspective consistency to DLMs with only 5B training tokens, achieving AR-thinking-level quality. 2. 🚀 I-DLM carefully trades compute for higher TPF while converting that advantage into real TPS under high-concurrency serving. 📖Website: https://t.co/826V7d49mA ⌨️Code: https://t.co/MPuy6rAXbq

9

282

37

237

344K

Outsider_pc retweeted

3 months ago

Back in Nov we developed Recap and trained π*-06 with RL. Now, we developed a fast *online* RL method that improves π-06 with as little as 15 min of robot data for precise tasks, using "RL tokens" exposed by our model that can be fed into a small actor-critic method.

8

523

48

180

36K

Outsider_pc retweeted

3 months ago

🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: https://t.co/MFy2NIOsrn Code: https://t.co/A7B5H8PIBh

160

4K

632

2K

1M

Outsider_pc retweeted

3 months ago

Imagine a single policy adapting aggressively across multiple embodiments and different domains—varying friction, mass, limb lengths. Can this be done online and zero-shot, without privileged environment parameters or retraining for each new domain? We take a step toward this goal with DADP, a diffusion-based policy for domain adaptation. DADP learns domain representations in a self-supervised manner from interaction context and integrates them into the diffusion generation process by biasing the prior distribution and re-formulating the diffusion target. Paper: https://t.co/cSsPA1Jmfc Website: DADP: https://t.co/T5AmbLRUGp Code (w/ Dataset & Checkpoints): https://t.co/RgssL0QJrR More details below.

8

90

16

46

43K

3 months ago

Thank my cooperators @YixiaoWang777, Qinghang, @DarthUtopian, @Yiheng_Li_Cal, Guojian for this interesting project!

0

1

0

0

133

3 months ago

Imagine a single policy adapting aggressively across multiple embodiments and different domains—varying friction, mass, limb lengths. Can this be done online and zero-shot, without privileged environment parameters or retraining for each new domain? We take a step toward this goal with DADP, a diffusion-based policy for domain adaptation. DADP learns domain representations in a self-supervised manner from interaction context and integrates them into the diffusion generation process by biasing the prior distribution and re-formulating the diffusion target. Paper: https://t.co/cSsPA1Jmfc Website: DADP: https://t.co/T5AmbLRUGp Code (w/ Dataset & Checkpoints): https://t.co/RgssL0QJrR More details below.

8

90

16

46

43K

3 months ago

7/ Future directions DADP focuses on extracting static information and discarding transient cues. But in many settings, domain parameters may vary as well, where those varying signals are also important. An interesting next step is to learn representations that preserve both. At a broader level, DADP centers on a fundamental question: what should a representation capture in next-state/frame/token predictive training? We believe this perspective has implications well beyond domain-adaptive control. In particular, it may help inform the next generation of world model pretraining, where the goal is not only to model the world’s dynamical evolution, but also to recover the persistent structure that governs it.

0

2

0

0

126

3 months ago

6/ Representation Utilization Ablations We also discuss how the learned representation is utilized in the policy. The conclusion is very consistent: the full DADP design performs best. We visualize the representation trajectories of conditional policy and the DADP policy, which shows clearly that the modulated diffusion greatly enhances the domain locating and downstream policy performance.

0

2

0

0

129

3 months ago

5/ Representation Quality Ablations We evaluate the DADP representations by reconstructing the domain index and information. The trend is very clear: as the temporal gap increases, the representations become much more separable and informative, eventually reaching the qualities of Supervised Learning results. So the superior performance of DADP is not just a matter of architectural luck. It starts from a better domain representation.

Outsider_pc's tweet photo. 5/ Representation Quality Ablations

We evaluate the DADP representations by reconstructing the domain index and information. The trend is very clear: as the temporal gap increases, the representations become much more separable and informative, eventually reaching the qualities of Supervised Learning results.

So the superior performance of DADP is not just a matter of architectural luck. It starts from a better domain representation.

0

3

1

3

390

3 months ago

4/ Experimental Results We evaluate DADP on MuJoCo locomotion and Adroit manipulation in the zero-shot setting, where test-time adaptation relies only on online-collected context. In this challenging setting, DADP consistently outperforms strong baselines, with stronger performance and greater stability across seeds. Moreover, unlike prior work that often considers only minor domain randomization, our benchmarks include aggressive dynamics and morphological variations. We also open-source the domain randomization and expert training pipeline to make these results reproducible and to support future research.

Outsider_pc's tweet photo. 4/ Experimental Results

We evaluate DADP on MuJoCo locomotion and Adroit manipulation in the zero-shot setting, where test-time adaptation relies only on online-collected context. In this challenging setting, DADP consistently outperforms strong baselines, with stronger performance and greater stability across seeds.

Moreover, unlike prior work that often considers only minor domain randomization, our benchmarks include aggressive dynamics and morphological variations.

We also open-source the domain randomization and expert training pipeline to make these results reproducible and to support future research.

0

1

0

0

126

Last Seen Users on Sotwe

Trends for you

Most Popular Users