Xingjian Bai

@SimulatedAnneal

Ph.D. student at @MITEECS. Previous RA at @Oxford_VGG.

Joined August 2022

526 Following

535 Followers

10 Posts

Xingjian Bai

@SimulatedAnneal

2 months ago

@TongPetersb @wenhaocha1 @sainingxie @ylecun @mengyer @YiMaTweets @LukeZettlemoyer @liuzhuang1234 Congrats Peter! ⚡️

286

Xingjian Bai

@SimulatedAnneal

3 months ago

In our formulation, image tokenization and latent generation become two sides of the same coin. One model, one stage, from scratch—no pretrained encoders needed. Especially excited about applying UNITE to modalities like molecules and crystals, where a pretrained DINO simply doesn't exist. Very unforgettable collaboration with @ShivamDuggal4 and our amazing team at Adobe Research!

Shivam Duggal @ShivamDuggal4

3 months ago

Tokenization & Generation power Large Models. But are they really separate? Tokenization=Generation under strong observability UNITE: An end-to-end training framework where one shared Generative Encoder (GE) performs both token. & latent denoising Paper: https://t.co/8idMdy123h

ShivamDuggal4's tweet photo. Tokenization & Generation power Large Models. But are they really separate?

Tokenization=Generation under strong observability

UNITE: An end-to-end training framework where one shared Generative Encoder (GE) performs both token. & latent denoising
Paper: https://t.co/8idMdy123h https://t.co/Yjf6cFnMaP

413

296

65K

Xingjian Bai

@SimulatedAnneal

4 months ago

Project page: https://t.co/b6UeTJa8V4 Code (train from scratch): https://t.co/HOifFpGDJQ This work would not be possible without my amazing collaborators, @guande_he , @xxunhuang, @zhengqi_li, @elishechtman, @zongze_wu. I truly learned a lot from you through this exciting journey.

Xingjian Bai

@SimulatedAnneal

4 months ago

Do causal video diffusers really need dense causal attention at every layer, every denoising step? We looked inside and found: no. Causality is separable from denoising. Here are two surprising observations that hold across architectures, training objectives, and scales.

SimulatedAnneal's tweet photo. Do causal video diffusers really need dense causal attention at every layer, every denoising step?

We looked inside and found: no. Causality is separable from denoising.

Here are two surprising observations that hold across architectures, training objectives, and scales. https://t.co/Zgfrawq8WP

330

193

68K

Xingjian Bai

@SimulatedAnneal

4 months ago

Trained from scratch, SCD beat previous models in all metrics at 4x lower latency. We also fine-tuned from WAN 2.1, matching the VBench performance of the best frame-wise autoregressive models, while having 35% lower latency than Self Forcing, >10x faster than original WAN 2.1.

Xingjian Bai

@SimulatedAnneal

Last Seen Users on Sotwe

Trends for you

Most Popular Users