Bowei Chen @bowei_chen_19 - Twitter Profile

Pinned Tweet

9 months ago

We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.

bowei_chen_19's tweet photo. We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation!

Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀. https://t.co/quD5hHaWYf

7

521

70

328

81K

bowei_chen_19 retweeted

Hansheng Chen @HanshengCh

about 1 month ago

New paper: AsymFlow🔥 JiT x0-prediction is not enough for pixel generation. Better keep velocity in a low-rank subspace: - 1.57 FID on ImageNet (best pixel flow model) - Finetunes FLUX.2 klein into pixel space, beats the original on HPSv3/DPG/GenEval (#1 overall on HPSv3) 1/7

HanshengCh's tweet photo. New paper: AsymFlow🔥

JiT x0-prediction is not enough for pixel generation. Better keep velocity in a low-rank subspace:

- 1.57 FID on ImageNet (best pixel flow model)
- Finetunes FLUX.2 klein into pixel space, beats the original on HPSv3/DPG/GenEval (#1 overall on HPSv3)

1/7 https://t.co/FSz46hrJHj

20

281

55

196

54K

Bowei Chen @bowei_chen_19

4 months ago

@vivjay30 @sesame Amazing!

0

1

0

109

Bowei Chen @bowei_chen_19

4 months ago

Nice blog, highly recommend!

Kieran Didi @DidiKieran

4 months ago

Too many REPA / RAE / representation alignment papers lately? I was lost too, so I wrote a blog post that organizes the space into phases and zooms in on what actually matters for general/molecular ML. Curious what folks think - link below! 🔗 Blog: https://t.co/6aJf8DCWTa

DidiKieran's tweet photo. Too many REPA / RAE / representation alignment papers lately?
I was lost too, so I wrote a blog post that organizes the space into phases and zooms in on what actually matters for general/molecular ML.
Curious what folks think - link below!

🔗 Blog: https://t.co/6aJf8DCWTa https://t.co/Jp62LpFYzb

9

535

93

484

79K

0

5

0

1

435

Who to follow

Xianghui Xie ✈️ CVPR

@XianghuiXie

PhD student at Max Planck Institute for Informatics and Tübingen University. Interning at NVIDIA robotics. Opinions are my own. https://t.co/Jl0YSAlABd

Chen Geng

@gengchen01

CS Ph.D. Student @Stanford @StanfordAILab. Building 4D world models with physics learned from data.

Zhiyang (Frank) Dou

@frankzydou

PhD student @MIT_CSAIL. MPhil @HKUniversity. Ex-visiting @Penn. Dynamics Modeling, Physical AI, Robotics, Sim, Geometry, Control, AIGC. 🦋https://t.co/YpydZBLKs1

bowei_chen_19 retweeted

Jingwei Ma @JingweiMa2

6 months ago

Excited to present UltraZoom at SIGGRAPH Asia next Tuesday (Dec.16)! UltraZoom converts sparse phone captures of an object into a single gigapixel-resolution image that you can seamlessly explore. Threads below. Website: https://t.co/XinzBbkEXH Paper: https://t.co/Ed26gMUaqZ

2

12

3

1K

bowei_chen_19 retweeted

Hansheng Chen @HanshengCh

8 months ago

Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation https://t.co/6ro55E1XGP Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.

HanshengCh's tweet photo. Excited to announce a new track of accelerating Generative AI:

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
https://t.co/6ro55E1XGP

Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality. https://t.co/kxIZs7j3vC

2

155

27

86

36K

Bowei Chen @bowei_chen_19

8 months ago

The Representation Autoencoders (RAE) by @sainingxie's team is fascinating — a brilliant demonstration that high-dimensional diffusion is indeed feasible. In our latest work on semantic encoders, we align a pretrained foundation encoder (e.g., DINOv2) as a visual tokenizer, achieving better reconstruction quality while preserving semantic consistency. Instead of freezing the encoder, we introduce a semantics-preserving fine-tuning strategy that significantly improves reconstruction quality. I can see great potential in combining RAE with our approach to build semantically rich tokenizers with large channel dimension and strong reconstruction fidelity.

Bowei Chen @bowei_chen_19

9 months ago

We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.

7

521

70

328

81K

2

227

21

143

24K

Bowei Chen @bowei_chen_19

8 months ago

@SwayStar123 @sainingxie Yes! I can see great potential in combining RAE with our approach to build semantically rich tokenizers with large channel dimension and strong reconstruction fidelity (we fine-tuned the encoder for better reconstruction).

0

1

0

39

Bowei Chen @bowei_chen_19

8 months ago

@Jacoed Yes, this is shown in both our work and previous work like VA-VAE.

1

0

19

Bowei Chen @bowei_chen_19

9 months ago

We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.

7

521

70

328

81K

Bowei Chen @bowei_chen_19

9 months ago

We hope our findings inspire a rethinking of tokenizer design in generative modeling. 🙏 Huge shoutout to my amazing co-authors and collaborators @KaiZhang9546,@Sai__Bi,@HaoTan5,@zhanghesprinter ,@tianyuanzhang99 ,@zhengqi_li, @bitxiong,@jianming_zhang_. [9/N]

0

14

1

0

1K

Bowei Chen @bowei_chen_19

9 months ago

On LAION 2B dataset, we train a text-to-image diffusion model on our tokenizer, which converges faster and surpasses the FLUX-VAE baseline. Check out more details and results in our paper! [8/N]

bowei_chen_19's tweet photo. On LAION 2B dataset, we train a text-to-image diffusion model on our tokenizer, which converges faster and surpasses the FLUX-VAE baseline.

Check out more details and results in our paper!

[8/N] https://t.co/d8N9PJaOvA

1

13

1

1K

Bowei Chen @bowei_chen_19

almost 2 years ago

@somebobcat8327 Thanks!

0

1

0

33

Bowei Chen @bowei_chen_19

almost 2 years ago

#CVPR2024 Arm-captured selfies only capture your partial body. Instead, what if you could capture a full-body photo that someone else would take of you in the scene? We present Total Selfie, which generates full-body selfies from photographs originally taken at arms length. 1/n

bowei_chen_19's tweet photo. #CVPR2024 Arm-captured selfies only capture your partial body. Instead, what if you could capture a full-body photo that someone else would take of you in the scene?

We present Total Selfie, which generates full-body selfies from photographs originally taken at arms length. 1/n https://t.co/klA62DbHka

2

7

2

3

2K

Bowei Chen

@bowei_chen_19

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users