Jiaqi Feng

Verified account

@FengLeader

Undergraduate @Tsinghua_Uni , now intern @UCLA, peviously intern @Kling_AI Video gen/World model, ML/CV, Generation model

Joined July 2020

240 Following

14 Followers

43 Posts

Pinned Tweet

22 days ago

Excited to share our latest work: One-Forcing! 🎉 We compressed standard 4-step autoregressive video generation into just 1-STEP, achieving a 4x theoretical speedup for real-time generation! 🚀 The crazy part? Our 1-step model outperforms strong 4-step baselines on VBench! 🔥 📄 Paper: https://t.co/Saza8LMuAx 🌐 Project: https://t.co/NasGwqo9wC ⭐ Code: https://t.co/p4OWjg39wh

FengLeader's tweet photo. Excited to share our latest work: One-Forcing! 🎉
We compressed standard 4-step autoregressive video generation into just 1-STEP, achieving a 4x theoretical speedup for real-time generation! 🚀
The crazy part? Our 1-step model outperforms strong 4-step baselines on VBench! 🔥
📄 Paper: https://t.co/Saza8LMuAx
🌐 Project: https://t.co/NasGwqo9wC
⭐ Code: https://t.co/p4OWjg39wh

5

8

0

1

36K

4 days ago

@arrakis_ai Crazy! How long did you make it

0

0

0

0

140

5 days ago

@hugothomel Great article. I also think of the idea that world and palyer should be 2 part. You think much deeper!

1

2

0

0

39

7 days ago

@sun_hanchi communism or cyberpunk

0

1

0

0

14

7 days ago

@purshow04 Haha. Maybe it's too popular, so reviewers cant' give a oral.

0

2

0

0

286

7 days ago

@sean_pixel Failure also can be good or bad

0

1

0

0

54

8 days ago

@qinzytech The gap between ICL and BP is too large. The former is heavy and the latter is light. For human, it’s a middle state. How to design it in AI?

1

1

0

0

32

8 days ago

I wonder about the relationship between learning efficiency and generalization? In my observations, recent methods can't do both well. learning too fast maybe means overfit. But for real intelligence they come together. So JEPA/LeWorldModel and other representation-based method claim they have few-shot learning ability. But are we traped in the clean world? I agree with compositional generalization. But that's not how LLM works. Aristotle and Plato are two different paradigm, down2top and top2down respectively. What will be the answer for the robotics?

10 days ago

VLA-JEPA just dropped in LeRobot 🤖 What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics. During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos. At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head. The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on @NVIDIARobotics DGX Spark! VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀 @Thom_Wolf @ClementDelangue

31

1K

187

811

295K

0

1

0

0

123

8 days ago

@VoidAsuka learning too fast maybe means overfit. But for real intelligence they come together

0

0

0

0

12

8 days ago

@VoidAsuka I wonder about the relationship between learning efficiency and generalization?

2

0

0

0

102

8 days ago

@Yi__Li State is action.

0

0

0

0

23

8 days ago

@NandoMetzger @CVPR Cool glasses

0

1

0

0

97

8 days ago

@andrew_n_carr The most informative thing is our world. But it’s definitely the ONLY probability. So the key is to interact and learn from world. Human is an adoption of the nature

0

0

0

0

99

8 days ago

@realleonlc https://t.co/oYBfxNX4mc shares the similar idea. TTT for generation may be the future

0

0

0

1

127

8 days ago

@sarahwooders For me, Agents.md are a group of hard rules of my own habits. Better fit rather than better performance

0

1

0

0

80

8 days ago

@peterpaohuang @Stanford_AI_Bio This reminds me of the deep theory of gauge chose in physics. Maybe w could build more complex math structures than vector field

0

0

0

0

93

9 days ago

@vincesitzmann For AR we use embeddings; for diffusion we use encoders/decoders. Yet for hybrid AR-diffusion models like recent world models, we know too little about what makes a good encoder.

0

1

0

0

206

9 days ago

@ndsong95 Yeah. VAE is totally recon. RAE teakes one step forward but it’s also limited in vision/pixel. We need Encoder with physical reality.

0

0

0

0

67

9 days ago

@chris_j_paxton Paper will only display the strengthens, but maybe weaknesses are more crucial. Academics should change.

1

0

0

0

42

9 days ago

@tom_doerr Seems like StreamVGGT?

0

1

0

0

81

9 days ago

@deepfates you mean neuroscience/psychology ？

0

0

0

0

38

Last Seen Users on Sotwe

Trends for you

Most Popular Users