🎮 Real-time multiplayer world model
👥 Arbitrary number of players
🧠 Generated entirely by a neural network
MultiGen is a real-time multiplayer diffusion game engine that supports an arbitrary number of players through a shared memory-based world model, rather than limiting interaction to just 2 players. While single-player world models can already be entertaining, things really change once multiple people can step into the same generated world together. Here’s a 30-minute timelapse of 4-player gameplay running in real time.
Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n)
https://t.co/evmf5dL5Sg
Super cool work from the Odyssey team! Great to see more momentum in this direction (and happy to see the MultiGen framework getting adopted 😉)
Multi-player/agent systems that scale to arbitrary numbers of agents might not be solvable with brute force scaling alone. Can’t wait to see where the field goes with this direction!
Introducing Agora-1, a multi-agent world model.
Multiple participants—human or AI—can now interact inside the same world simulation, all in real-time.
Try our playable research preview today, with Agora-1 simulating a multiplayer GoldenEye deathmatch!
High-fidelity generation is hitting a scaling crisis as DiT compute grows with image resolution and video length. But do we need high-resolution denoising at every step?
We introduce Spectral Progressive Diffusion, a plug-and-play framework for efficient image and video generation that directly exploits the spectral autoregression property of diffusion to grow resolution during denoising.
[1/7]
New paper: AsymFlow🔥
JiT x0-prediction is not enough for pixel generation. Better keep velocity in a low-rank subspace:
- 1.57 FID on ImageNet (best pixel flow model)
- Finetunes FLUX.2 klein into pixel space, beats the original on HPSv3/DPG/GenEval (#1 overall on HPSv3)
1/7
Had a lot of fun building this during spring break, pretty surreal to see a multiplayer generative game actually running in the browser (it even works it mobile). Go try it!
Our previous intern released an extremely impressive re-implemented demo of our paper on multiplayer diffusion game engines.
https://t.co/UHUEVfkK8h
I think this might be the first time you can play a fully-functional multiplayer generative game online with other people. 🤯
A couple of weeks ago, we introduced MultiGen, our work on real-time multiplayer world models. After spending way too many hours playing it with friends internally, we knew we had to share it.
Today, we're excited to collab with @modal to let you experience it for yourselves. Grab your squad and play the live demo here 👇
We built a real-time multiplayer game generated entirely by a neural network—and now you can actually play it.
In collaboration with @modal, we just launched the live demo for MultiGen, our diffusion-based multiplayer game engine. Grab some friends and try it here 👇
@GordonWetzstein@modal Super excited about releasing this! We've been having so much fun playing Multigen with our friends and now everyone can try it from their browser (and phones)
High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution?
Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most.
1/7🧵
Today, we announce our team’s progress in pursuing a different type of foundation model for robotics: the Direct Video Action Model (DVA), which does our best to take robotics and turn it into a generative modeling problem we can scale.
Technical blog: https://t.co/GMsxnC5wbJ
Excited to show some surprising inventions on generative multiplayer games we made at Google with Stanford. We call the work MultiGen.
I've always been inspired by early studios like id Software with Doom or Blizzard with Warcraft bringing networked video games to the next level. We are at the point in history where we can make strides like them, but for generative games. It's a strange feeling to be in the age of generative video games while still discovering how exactly to train the models and design the tools that make them useful.
All of the tools that have been invented for classic game engines need to be redesigned for generative games. For example level and world design is not entirely possible with existing technology. We introduce editable memory to diffusion game engines that allow for design of new levels via a minimap. But we can easily imagine how this can be expanded with different creation tools. The end goal of this research direction is to allow game designers to be able to guide the generation process of their world, at the granularity that they prefer.
Editable memory also allows us to add multiplayer to Generative Doom. We were amazed when we saw GameNGen some years ago, and now you can play it live with friends in real-time, on your couch or even online.
Shared representations like our editable memory seem like the future for this type of experience. Models are, in some cases, expensive and approximate encoders but great interpolators and extrapolators. Leveraging their strengths lets you have completely new experiences that can be realized now and not in the distant future.
This work was started at my previous team and continued in collaboration with Stanford. Congratulations to all for the discoveries.
It was a huge pleasure working with Nataniel and team on this project. Starting from his previous project (Unbounded), Nataniel’s vision for generative games is sure to shape the way we view entertainment in the coming years.
Excited to show some surprising inventions on generative multiplayer games we made at Google with Stanford. We call the work MultiGen.
I've always been inspired by early studios like id Software with Doom or Blizzard with Warcraft bringing networked video games to the next level. We are at the point in history where we can make strides like them, but for generative games. It's a strange feeling to be in the age of generative video games while still discovering how exactly to train the models and design the tools that make them useful.
All of the tools that have been invented for classic game engines need to be redesigned for generative games. For example level and world design is not entirely possible with existing technology. We introduce editable memory to diffusion game engines that allow for design of new levels via a minimap. But we can easily imagine how this can be expanded with different creation tools. The end goal of this research direction is to allow game designers to be able to guide the generation process of their world, at the granularity that they prefer.
Editable memory also allows us to add multiplayer to Generative Doom. We were amazed when we saw GameNGen some years ago, and now you can play it live with friends in real-time, on your couch or even online.
Shared representations like our editable memory seem like the future for this type of experience. Models are, in some cases, expensive and approximate encoders but great interpolators and extrapolators. Leveraging their strengths lets you have completely new experiences that can be realized now and not in the distant future.
This work was started at my previous team and continued in collaboration with Stanford. Congratulations to all for the discoveries.
🎮 Real-time multiplayer world model
👥 Arbitrary number of players
🧠 Generated entirely by a neural network
MultiGen is a real-time multiplayer diffusion game engine that supports an arbitrary number of players through a shared memory-based world model, rather than limiting interaction to just 2 players. While single-player world models can already be entertaining, things really change once multiple people can step into the same generated world together. Here’s a 30-minute timelapse of 4-player gameplay running in real time.
One nice consequence of external memory is that it turns level design into a native part of the system.
The world is defined explicitly through a top-down map layout, so users can build or modify the environment before inference starts, while the model generates first-person observations that stay aligned with that structure.
Video world models today have a very limited context length.
Mode Seeking meets Mean Seeking (MMM) unlocks long-context, persistent video world models through a unified representation.
1/8 🧵
today we are releasing new research at Google. we tackle the previously unsolved task of editing motion in an existing video. it's called MotionV2V. with it you can move objects in videos, move the camera, and other unprecedented edits in user-provided video
Long video generation usually results in context increasing/scaling during chunk/frame-wise rollout.
Considering context scaling may require context selection, we thus introduce the idea of MoE into long context modelling and propose Mixture of Contexts. All previous context/memory is considered while the chosen ones are computed in a data-driven manner. You can easily enjoy 7x compute savings.
"World model" has been an overloaded term, used by different communities in various contexts.
Here's a fantastic blog from Xun, packed with great insights on what world models could look like. A very valuable read for anyone working in this space!
What exactly is a "world model"? And what limits existing video generation models from being true world models?
In my new blog post, I argue that a true video world model must be causal, interactive, persistent, real-time, and physical accurate.
https://t.co/f4fgawgWwv