Chao Feng @chaof1234 - Twitter Profile

14 days ago

Our UMA is a unified model for object motion and robot action that learns from heterogeneous data sources using 3D object-motion trajectories as a shared interface. Check it out: https://t.co/Nha6IiKW5O

Yunhao Cao @QuantummCookie

14 days ago

Introducing Unified Motion-Action (UMA) Model, a robot foundation model that uses 3D object motion as a shared interface for heterogeneous robot learning. UMA treats motion and action as co-evolving variables, enabling knowledge transfer across data sources and versatile inference. 🧵 1/n

3

118

26

72

2M

0

60

15

30

10K

chaof1234 retweeted

Xichen Pan

@xichen_pan

18 days ago

Modern text-to-image models are increasingly powered by large pretrained LLMs. But there is a curious mismatch: the LLM typically encodes the prompt only once, while the evolving noisy latent states are handled entirely by a newly trained generative backbone. Can pretrained multimodal prior participate in the denoising process? Introducing RepFusion. (1/12) 📄 https://t.co/WbkTtg5M79 🌐 https://t.co/iDHggosNJX

xichen_pan's tweet photo. Modern text-to-image models are increasingly powered by large pretrained LLMs.

But there is a curious mismatch: the LLM typically encodes the prompt only once, while the evolving noisy latent states are handled entirely by a newly trained generative backbone.

Can pretrained multimodal prior participate in the denoising process?

Introducing RepFusion. (1/12)

📄 https://t.co/WbkTtg5M79
🌐 https://t.co/iDHggosNJX

2

130

36

75

25K

chaof1234 retweeted

Dandan Shan

@DandanShan_

16 days ago

🧐A question I've long been interested in: how can we learn from human hands and transfer that directly to robots? Our new work, HUG, makes it possible in three simple steps: (1) collect human grasps at scale, (2) learn from them, and (3) retarget for deployment.

4

110

24

20

9K

Chao Feng

@chaof1234

about 2 months ago

@SarahJabbour_ @ChicagoBooth @UChicago @UMich Congrats!!

1

0

35

chaof1234 retweeted

Jiawei Yang

@JiaweiYang118

2 months ago

Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.

JiaweiYang118's tweet photo. Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space.

Now it is 0.75, and can be even lower.

Many wonder how.

I thought it might end as a small FID prank: simple and deliberate.

It started with one question: can FID be optimized directly, and what does it reveal?

Introducing FD-loss.

56

959

157

623

232K

chaof1234 retweeted

Xiyao Wang✈️CVPR2026 @XiyaoWang10

10 months ago

Thinklite-VL is now accepted by #NeurIPS2025 as spotlight🎉 Excited to catch up with old friends and meet new ones in San Diego!

0

27

7

3

4K

Chao Feng

@chaof1234

10 months ago

@CzyangChen congrats

0

1

0

53

chaof1234 retweeted

Xiyao Wang✈️CVPR2026 @XiyaoWang10

10 months ago

Thanks to AK for sharing our paper!🎉 Training a generative critic model to judge responses makes it BETTER at EVERYTHING. Sometimes the best policy comes from good judgment. Your critic model has been hiding its true potential🌟 🚀Introducing LLaVA-Critic-R1, a family of VLMs that serve as both critic and policy in a single model. No policy training. No in-domain task data. Just 40k preference pairs "Is response A or B better?" for Critic RL Training! Result: +5.7% on 26 visual benchmarks including visual understanding, reasoning, even GUI agents. 71.9 7B-Scale SoTA performance on MMMU! Learn to judge, excel at everything🎭 📄 Paper: https://t.co/KhDLvWpXVn 💻 Code: https://t.co/UGDWDvCLrk

1

18

7

8

8K

chaof1234 retweeted

AK

@_akhaliq

about 1 year ago

GPS as a Control Signal for Image Generation

4

87

15

37

16K

chaof1234 retweeted

seunghyun lee @seunghy23235

about 1 year ago

Please join us on poster #369 tomorrow afternoon @CVPR

0

13

3

2

1K

Chao Feng

@chaof1234

about 1 year ago

Work with @CzyangChen , @holynski_ , Alexei A. Efros, and @andrewhowens. Paper: https://t.co/4J7lWETdJi Project page: https://t.co/J6XAptjr1j

0

2

0

129

Chao Feng

@chaof1234

about 1 year ago

Sharing our #CVPR2025 paper: "GPS as a Control Signal for Image Generation"! 🛰️+✍️ We turn the GPS tag stored in EXIF of photos into a control signal for diffusion models—so they don’t just know what you asked for, but where you want it to look like. Come to see our poster at Friday 13 Jun 10:30 a.m. — 12:30 p.m. (CT) in ExHall D, Poster #250.

2

37

10

4

3K

Chao Feng

@chaof1234

about 1 year ago

Beyond 2D, we can lift a 3D model directly from our GPS-conditioned model by score distillation sampling, which is trained per landmark.

chaof1234's tweet photo. Beyond 2D, we can lift a 3D model directly from our GPS-conditioned model by score distillation sampling, which is trained per landmark. https://t.co/UFQZ9q39Gr

1

2

0

168

chaof1234 retweeted

Ayush Shrivastava @ayshrv

about 1 year ago

Excited to share our CVPR 2025 paper on cross-modal space-time correspondence! We present a method to match pixels across different modalities (RGB-Depth, RGB-Thermal, Photo-Sketch, and cross-style images) — trained entirely using unpaired data and self-supervision. Our approach learns correspondences through contrastive random walks across visual modalities. #CVPR2025 (1/6)

ayshrv's tweet photo. Excited to share our CVPR 2025 paper on cross-modal space-time correspondence!

We present a method to match pixels across different modalities (RGB-Depth, RGB-Thermal, Photo-Sketch, and cross-style images) — trained entirely using unpaired data and self-supervision.

Our approach learns correspondences through contrastive random walks across visual modalities.

#CVPR2025 (1/6)

1

120

26

79

9K

chaof1234 retweeted

Jeongsoo Park @jespark0

about 1 year ago

Can AI image detectors keep up with new fakes? Mostly, no. Existing detectors are trained using a handful of models. But there are thousands in the wild! Our work, Community Forensics, uses 4800+ generators to train detectors that generalize to new fakes. #CVPR2025 🧵 (1/5)

1

24

9

0

2K

chaof1234 retweeted

Yiming Dou @_YimingDou

about 1 year ago

Ever wondered how a scene sounds👂 when you interact👋 with it? Introducing our #CVPR2025 work "Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes" -- we make 3D scene reconstructions audibly interactive! https://t.co/tIcFGJtB7R

2

98

30

36

8K

Chao Feng

@chaof1234

Last Seen Users on Sotwe

Trends for you

Most Popular Users