Jinnan Chen @jinmelo7 - Twitter Profile

jinmelo7 retweeted

9 days ago

New blog post: The Forgetting Wall in Video and World Models Long-horizon video generation is not just limited by compute. It is limited by how much of its own past the model can afford to remember. I wrote about why long videos drift, why KV cache becomes the memory bottleneck, and why compression is a key direction for future video/world models. https://t.co/ORp0ma4P2m

HaochengXiUCB's tweet photo. New blog post: The Forgetting Wall in Video and World Models

Long-horizon video generation is not just limited by compute. It is limited by how much of its own past the model can afford to remember.

I wrote about why long videos drift, why KV cache becomes the memory bottleneck, and why compression is a key direction for future video/world models.

https://t.co/ORp0ma4P2m

7

163

27

119

129K

jinmelo7 retweeted

Liang Zheng

@LiangZheng_06

10 days ago

Diffusion is differentiable. LLMs aren't. So why is the diffusion community copying RL methods (GRPO etc.) from LLMs? The native post-training for diffusion is gradient descent such as ReFL and LeapAlign. Paper: https://t.co/uoy9mCGJSv

3

311

28

272

36K

jinmelo7 retweeted

Vincent Sitzmann

@vincesitzmann

10 days ago

Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n) https://t.co/evmf5dL5Sg

11

390

70

222

51K

jinmelo7 retweeted

Yossi Gandelsman

@YGandelsman

15 days ago

Proud of what our amazing team has accomplished. We spent the past few months pursuing one bet: that layouts are the right intermediate representation for generation and editing. [1/n] https://t.co/EKRyGOqcqJ

17

180

14

28

2M

Who to follow

Linning XU

@EvenEveno

spatial & embodied intelligence ✨🫧

Henry Henry Zhao

@ZHHHYuan

NUS' Show Lab Ph.D. Student, supervised by @MikeShou1. I am working on multimodal reasoning and GUI automation.

Tinghui Zhou

@TinghuiZhou

Building interactive worlds @Roblox

jinmelo7 retweeted

Xun Huang

@xxunhuang

16 days ago

I'm excited to announce that the Morpheus AI team is joining Roblox! Over the past two years, I’ve focused on developing the foundational architectures behind modern video world models, including Self Forcing and AR-DiT. This work unlocked something unprecedented: the ability to move beyond offline, pre-rendered AI video generation and instead simulate interactive worlds in real time. Realizing the massive potential of this technology is what drove me to found Morpheus in August 2025. In the months since, our incredible team has pushed those boundaries further than we ever thought possible. We've always believed video world models will reshape how games are created. Roblox Reality is an ambitious bet on that exact future, and it lines up perfectly with what we set out to do: bridging the gap between deterministic game engines and generative world models. Joining Roblox means our technology will help power experiences that reach millions of players every day. To our team, to @a16z and other investors, and to the advisors, partners, and supporters who believed in this from the very beginning — thank you. We're just getting started. Excited to build this at scale.

87

564

25

91

133K

jinmelo7 retweeted

Yuyang Zhao

@yuyangzhao_

18 days ago

🚀 SANA-Streaming: Hybrid Diffusion Transformer + System Co-design = Real-Time Streaming Video Editing 💥 Key Features 🌟 🧠 Hybrid DiT Architecture -> Fixed VRAM and complexity. 🔄 Cycle-Reverse Regularization -> Enforces long-range consistency without paired long video data 🛠️ Efficient System Co-design -> Fused GDN kernels + Mixed-Precision Quantization highly optimized for NVIDIA Blackwell. Numbers 📊 ⚡ 58 DiT FPS and 24 end-to-end FPS for real-time 1280×704 resolution editing on a single consumer RTX 5090 GPU. 📦 Flat VRAM: Uses just 5.56 GB of constant memory regardless of video length, completely avoiding OOM errors. 🔥 Up to 100× higher inference throughput than prior SOTA offline editors. 🎬 Project page: https://t.co/J4yLjLNSyf 📄 Paper: https://t.co/MrRuh3veVk

2

83

22

53

19K

jinmelo7 retweeted

Saining Xie

@sainingxie

23 days ago

📸latest in our cambrian series: cambrian-p, p for pose. i think pose is probably the minimal sufficient 3d signal (and it’s easy to get!) that we need for robust video multimodal models -- jointly modeling frames and pose turns image sequences into a globally grounded structure.

8

185

27

60

19K

jinmelo7 retweeted

Carlos Barreto

@carlosedubarret

26 days ago

CEB SAM3D Body v3 WIP This is the first test where we have global position even with a moving camera. #b3d

8

446

42

309

49K

jinmelo7 retweeted

David Baszucki

@DavidBaszucki

27 days ago

Our vision for multiplayer photorealism is a hybrid architecture merging 3D cloud gaming with AI video upsampling on the edge. The video model and our cloud 3D engine can potentially drive each other bi-directionally, acting as both an upsampler as well as a real-time dreamer, generating parts of the 3D scene in real time. You can check out an early playable demo here from our Roblox Labs Team. Our video world model uses the Roblox Engine as a programmable harness, layering structured logic, state tracking, and multiplayer participation onto the generative power of action-conditioned world models.

503

416

55

122

243K

jinmelo7 retweeted

Zhiyang (Frank) Dou

@frankzydou

about 1 month ago

Introducing ✨RigidFormer: Learning Rigid Dynamics with Transformers - our attempt to scale learning-based physical dynamics with Transformers. RigidFormer learns rigid dynamics with Transformers. It is a mesh-free, object-centric Transformer for multi-object rigid-body contact dynamics from point clouds. Learning physics with purely neural simulators, without relying on traditional physics engines, is an important and widely studied problem. Prior SOTA methods often use graph neural networks for accuracy and generalization, but still struggle with efficient, high-fidelity simulation at scale. RigidFormer uses only point inputs, matches or outperforms mesh-based baselines on standard benchmarks, runs much faster, generalizes across point resolutions and datasets, and scales to 200+ objects. We also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components. RigidFormer is mesh-free: it does not require mesh connectivity, SDFs, or vertex-level message passing, making it well-suited for point-cloud observations and scalable simulation. This architecture can also be adapted to learn soft-body dynamics by replacing the rigid-body module (differentiable Kabsch alignment). 🎬See our video for more details. Many thanks to my amazing collaborators: Minghao Guo @GuoMh14, Haixu Wu @Haixu_Wu_1998, Doug Roble, Tuur Stuyck @TuurStuyck, and Wojciech Matusik @wojmatusik. Project page: https://t.co/6TBaRPVEYo Paper: https://t.co/3OQUSJSND3

6

298

60

210

571K

jinmelo7 retweeted

Zach Dive

@zachdive

about 1 month ago

https://t.co/Vh8yU33jUL

25

273

18

368

27K

jinmelo7 retweeted

DailyPapers

@HuggingPapers

about 1 month ago

Mean Mode Screaming A 1000-layer Diffusion Transformer trained with Mean-Variance Split Residuals that prevents the sudden mean-dominated collapse plaguing ultra-deep generative models.

HuggingPapers's tweet photo. Mean Mode Screaming

A 1000-layer Diffusion Transformer trained with Mean-Variance Split Residuals that prevents the sudden mean-dominated collapse plaguing ultra-deep generative models. https://t.co/QJ02gQ96Bd

1

84

10

87

8K

jinmelo7 retweeted

Dilum Sanjaya

@DilumSanjaya

about 1 month ago

Fun interactive science app ideas | Part 3 Played around with generating 3D biological structures and made an app to explore them interactively UI Design GPT Images 2 Code Gemini 3.1 Pro More demos ↓

526

17K

2K

15K

2M

jinmelo7 retweeted

David Baszucki

@DavidBaszucki

about 1 month ago

New Roblox avatar tech is live. Including support for fingers. https://t.co/tm7cxxkAkg

427

405

40

85

82K

jinmelo7 retweeted

Alvaro L

@L42ARO

about 1 month ago

Physical AI robotics need actionable outputs like 3D coordinates, not bullet points or nice paragraphs. So decided to experiment by combining a VLM with Monocular Depth Estimation, essentially projecting 2D reasoning into 3D. Worked pretty well, figured to share, check repo👇

9

103

16

93

11K

jinmelo7 retweeted

Vishwajeet

@Bootsblac

about 1 month ago

AI just generated 20 floor plans. Not images. Not concepts. Fully editable CAD models. Built using Codex @OpenAI and @opengeometry Rendering using @threejs Text → CAD for Architecture is here. Type a prompt → get real, usable floor plans. No third-party tool, no Revit, no AutoCAD! Kernel GitHub code in comments. What do we build next? #cad #ai #opensource @sama #architecture

6

337

35

469

32K

jinmelo7 retweeted

Weijie Wang @wjwang2003

about 2 months ago

🚀 Introducing World-R1: Video models already know 3D — they just need RL to wake it up! No arch changes. No video training data. No extra inference cost.⬇️ 🌐Website: https://t.co/WRpUVcYSTZ

13

542

71

404

61K

jinmelo7 retweeted

Yi Zhou @Papagina_Yi

about 2 months ago

Coarse2Real (C2R) transfers simple 3D renderings into realistic style video. Check our paper and project page to learn how to hedge small amount of synthetic paired data with real non-pair data for training the C2R model. We will release the model soon! https://t.co/tBaoEQtp8B

1

183

21

141

12K

jinmelo7 retweeted

Wildminder

@wildmindai

about 2 months ago

MoCapAnything V2. Maps motion onto whatever skeleton you give it. - 20x faster than mesh pipelines; - cuts angle error to ~10°; - DINOv2 + GL-GMHA cool thing for animators and game devs https://t.co/O8evKZDwVO

1

365

44

418

17K

Jinnan Chen

@jinmelo7

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users