Real-world models are here! Stoked to share how we're bringing real-world locations to life by integrating Street View into Genie. Try it now at https://t.co/j6c1N38tRS and read the blog for more info:
https://t.co/6ZOi9d9rah
Genie 3 🤝 @Waymo
The Waymo World Model generates photorealistic, interactive environments to train autonomous vehicles.
This helps the cars navigate rare, unpredictable events before encountering them in reality. 🧵
I got early access to Project Genie from @GoogleDeepMind ✨
It's unlike any realtime world model I've tried - you generate a scene from text or a photo, and then design the character who gets to explore it.
I tested dozens of prompts. Here are the standout features 👇
We’re hiring student researchers at Google DeepMind for 2026. Come work with our great team in SF on anything from diffusion / world models / 3D.
Send me an email if you’re interested!
@soumithchintala Congratulations on all you've accomplished so far! You did a lot to create a community around FAIR and NYU in the early days of my PhD, and it was a unique experience to watch you and the team hack on early PyTorch.
@E0M My bet is still on learning a world model from plentiful passive data before sprinkling in some actions, which is why I’m working on Genie. But I’d love to be proven wrong
GEN-0 models exhibit strong scaling laws, in which more pretraining data and compute consistently (and predictably) improve downstream post-training performance of the model across many tasks. 🔬📈🧠
@sedielem That is more my take. Which leads to good questions! How do we increase distribution modeling capacity? Improve the generative process? Or are images / videos inescapably an eg 1T parameter problem?
The future of AI is models that generate graphical interfaces. Instead of the linear, low-bandwidth metaphor of conversation, models will represent themselves to us as computers: rich visuals, direct manipulation, and instant feedback.
@AlexGDimakis It’s useful as a UI for humans too, and that’s the first place I’d expect this to show up. But say you want to train an agent in e.g. a Genie environment. It’s already common to use VLMs for success detection in robotics, where we also don’t have a ground truth reward
@scychan_brains I guess an interesting question is what the closest thing to a battery might be. The most general-purpose artifact you can spend your ephemeral compute on to make it widely valuable later. Large model pretraining seems like a good candidate
@scychan_brains Computation is when you bring together hardware and power at the same time in the right way and get outputs. Interestingly you can't "store" this at all, e.g. save up your datacenter for a month and then spend it all at once
@scychan_brains Yes, that's the durability row. I'm distinguishing computation from e.g. GPUs or credits:
- GPUs / datacenters are only the capital part of compute. Ongoing costs (e.g. power) are just as significant and scale differently
- Credits aren't new, they're just scrip like gift cards