Whoa, this is such a cool technique!
It adds reference structure/style guidance without an inference slowdown to any flow matching model
Just built a demo for it on @huggingface Spaces
NVIDIA just dropped another cool world model - SANA-WM.
You give it one image and a camera path, and it spits out a full minute of 720p on a single GPU.
> handles a full 60s
> 1-minute clip in just 34 seconds (on a 5090).
> follows precise 6-DoF paths, movement feels intentional and grounded.
> Based on SANA 2.6B
World generator with cinematic fly-throughs for any fantasy setting
https://t.co/N6KxXv0LL7
Wan2.2 again.
SwiftI2V: Efficient 2K I2V video gen with 21GB VRAM.
- uses 200x less GPU-time than CineScale
- exact image fidelity
- decoupled processing
no models yet.
https://t.co/UmfRrwq3IY
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see.
@eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
1/
You might not need Qwen-VL anymore for image→video prompts.
I found a free trick using Gemini in Chrome that can auto-caption images and run my I2V workflows.
Works with tools like Wan2GP and ComfyUI.
And it worked better than I expected 👇
3/ Result: Gemini acts like a free vision model + workflow operator.
Great workaround if you want Qwen-VL style captioning but don’t have enough VRAM.
Limitation: this is UI automation, so it won’t work for API pipelines — but for local workflows it’s surprisingly powerful.
ok, Video AI is finally moving from aesthetic vibes to world understanding.
VBVR- a foundational scale-up for video reasoning;
- shift focus from visual quality to spatiotemporal intelligence, reason over motion, interaction, causality.
https://t.co/53C32C2ExU