Hoy una industria entera dejó de tener sentido.
Un tío publicó en GitHub un repo que convierte cualquier foto en un mundo 3D explorable: meshes con físicas, splat del fondo, audio ambiente. Todo.
Una imagen entra. Un mundo sale. Cinco minutos.
La gente que se pasó diez años aprendiendo Blender lleva todo el día mirando esto en silencio.
Se llama image-blaster.
New paper from Apple - Sharp Monocular View Synthesis in Less than a Second
Mescheder et al. @ Apple just released a very impressive paper (congrats! 🎉🥳). You give it an image and it generates a really great looking 3d Gaussian representation. Uses depth pro. It's really good. The model is about 3GB, and takes ~5-10s on my M1 Max per image. Single feedforward pass network.
The video is using "Metal Splatter" to view the .ply from ml-sharp on Apple Vision Pro. Some really wow moments today trying it on different scenes.
Releasing Echo today is incredibly exciting for me — because it is a critical step for generative AI, enabling the creation of virtual worlds.
Echo is our first world model at SpAItial AI. It turns text or images into explorable 3D environments — spaces you can move through, inspect, and build on. Seeing this work in real time still feels a bit surreal.
My fascination with this goes back a long way: video games, virtual environments, and the idea of capturing the real world in 3D. As a researcher, I spent years working on 3D reconstruction, neural rendering, and scene understanding — all driven by the same question: how do we teach machines to understand the world?
One thing became clear over time: the biggest bottleneck isn’t compute or rendering — it’s 3D worlds themselves. High-quality, consistent environments are expensive to create by hand and don’t scale to the experiences we want to build. In particular, I believe that the ability to generate virtual worlds is ultimately key towards understanding the real world.
That’s why we founded SpAItial AI. We’re building spatial world models that combine geometric understanding with creative generation — models that can generate, edit, and eventually reason about 3D environments.
Echo is just the beginning. For me, this feels like the moment when decades of research finally meet the imagination that got many of us into graphics, games, 3D understanding in the first place.🌍
https://t.co/L0RxOUqFMa
New paper: You can make ChatGPT 2x as creative with one sentence.
Ever notice how LLMs all sound the same?
They know 100+ jokes but only ever tell one.
Every blog intro: "In today's digital landscape..."
We figured out why – and how to unlock the rest 🔓
Copy-paste prompt: 🧵
Did you know that when they say stuff like "The A18 uses TSMC's 3nm process" or "announced the 2nm node"
The 3nm, 2nm actually doesn't mean anything?! It's just like a version number. They make it up. Literally nothing measures 2nm or 3nm.
I certainly didn't know.
I reimagined my living room with World Labs. Gemini helped design it, World Labs generated the 3D environment, and VPS localized it to my space 1:1 scale.
I can now step into a persistent redesign in mixed reality and explore it as though it exists physically. How it was built:
Generate persistent 3D worlds from a single image, bigger and better than ever!
We’re excited to share our latest results and invite you to try out our world generation model in a limited beta preview.
The Spatial Web in visionOS 26 now lets websites include their own custom environments.
Built into Safari through WebKit, this makes it possible for sites to move beyond flat pages and live inside an immersive space.
Here’s what it looks like inside Vision Pro.
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”
We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly.
The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains.
https://t.co/lrJioBmpbT
How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context?
We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing.
1/4
Stanford released the videos from the Spring 2025 edition of CS231N: Deep Learning for Computer Vision - the first update in 8 years 👀
🎥 Lectures: https://t.co/uKx6sTK0Aj
📒 Slides + assignments: https://t.co/kSMULXtGjo
A bit late, but finally got around to posting the recorded and edited lecture videos for the **How to AI (Almost) Anything** course I taught at MIT in spring 2025.
Youtube playlist: https://t.co/DBZpQ0kbHk
Course website and materials: https://t.co/dFK9uyEigs
Today's AI can be applied to almost anything - from language to vision, audio, sensors, medical data, music, art, smell, and taste. This course covers the principles of AI (focusing on deep learning and foundation models), how we can apply AI to novel real-world data modalities, and multimodal AI that can process many modalities at once, such as connecting language and multimedia, music and art, sensing and actuation, and more.
Simulations are the future, & one of the main tools we’ll ultimately use to understand and predict things about the universe. This is why I’m so excited about Genie 3, our latest interactive world simulator - here are some insanely cool things you might have missed about it 🧵:
A picture now is worth more than a thousand words in genAI; it can be turned into a full 3D world! And you can stroll in this garden endlessly long, it will still be there.
🚀 My first tweet!
(1/n) Thrilled to share our new work: Context-as-Memory (CaM) — tackling the memory problem in Video World Model!
Our idea: context=memory. By leveraging context, CaM preserves consistency across generations (like Genie 3).
🎥 Check out our demo video below!
🚀We are thrilled to open-source Hunyuan-GameCraft, a high-dynamic interactive game video generation framework built on HunyuanVideo.
It generates playable and physically realistic videos from a single scene image and user action signals, empowering creators and developers to "direct" games with first-person or third-person perspectives.
Key Advantages:
🔹High Dynamics: Unifies standard keyboard inputs into a shared continuous action space, enabling high-precision control over velocity and angle. This allows for the exploration of complex trajectories, overcoming the stiff, limited motion of traditional models. It can also generate dynamic environmental content like moving clouds, rain, snow, and water flow.
🔹Long-term Consistency: Uses hybrid history condition to preserve the original scene information after significant movement.
🔹Significant Cost Reduction: No need for expensive modeling/rendering. PCM distillation compresses inference steps, boosting speed and lowering costs. This allows the quantized 13B model to run on consumer-grade GPUs like the RTX 4090.
Project Page: https://t.co/uAbiu9FRzF
Code: https://t.co/WgppVz1KUq
Technical Report: https://t.co/aO8plomaTr
Hugging Face:https://t.co/2ZOUWm6KKQ
What if you could not only watch a generated video, but explore it too? 🌐
Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt.
From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵
Google just took a big step towards building ChatGPT for Earth.
AlphaEarth Foundations does something clever -- instead of drowning in petabytes of Earth observation data, it creates compact summaries of every 10x10m square on Earth by fusing optical, radar, LiDAR, and climate data.
The kicker is it can see through clouds in Ecuador and reveal hidden agricultural patterns in Canada. MapBiomas and Global Ecosystems Atlas already using it for conservation work.