Test-time scaling nailed code & math—next stop: the real 3D world. 🌍
MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering.
One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀
🧵1/
Strongly agree with Prof. Fei-Fei Li on the importance of world models for spatial intelligence. Echoing that view, our NeurIPS’25 MindJourney is an early attempt to use world models for spatial reasoning—still at an early stage, but we hope it sparks discussion!
Code: https://t.co/qQ2eMPDf3O
#EmbodiedAI #SpatialIntelligence #AI
AI’s next frontier is Spatial Intelligence, a technology that will turn seeing into reasoning, perception into action, and imagination into creation. But what is it? Why does it matter? How do we build it? And how can we use it?
Today, I want to share with you my thoughts on building and using world models to unlock spatial intelligence in this essay below. 1/n
Thrilled to share MindJourney is accepted to #NeurIPS2025!
MindJourney brings controllable world models to Embodied AI reasoning—letting agents “imagine” spatial rollouts for better spatial understanding.
We updated our codebase and model weights recently: https://t.co/qQ2eMPDf3O
#EmbodiedAI #SpatialIntelligence #AI
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍
MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering.
One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀
🧵1/
So proud of my friend @yitong_deng and the @moonlake team — building the next paradigm of computer graphics where anyone can create and interact with their own worlds. Excited to see what comes next! 👏🏻
We raised $28M seed from Threshold Ventures, AIX Ventures, and NVentures (Nvidia's venture capital arm) —alongside 10+ unicorn founders and top AI researchers— to build reasoning models that generate real-time simulations and games.
Models are bottlenecked by practical simulations that can act as Reinforcement Learning environments. Human self-expression is bounded by tools that let us create alternate realities.
At Moonlake, we are building a future where anyone can create interactive worlds, bring their child-like wonder to life, learn within them, and most importantly, share experiences with people we care about.
More in 🧵
Just paid ¥4.99 to a site that "predicts" NeurIPS acceptance from your ratings and confidence scores.
Total scam-basically a random number generator. 🤡
I should build my own startup for this. Pretty sure I could make a fortune off researchers' anxiety these days.
#NeurIPS2025
VLM struggles badly to interpret 3D from 2D observations, but what if it has a good mental model about the world?
Checkout our MindJourney - A test-time scaling for spatial reasoning in 3D world. Without any specific training, MindJourney imagines (acts mentally) step-by-step in the diffusion World Model to address the problem by gathering more "visual" information. This new way of combining VLMs and World Models, would significantly unblock the power of thinking in the space!
Project: https://t.co/3FurBltgqZ
Code: https://t.co/ZetC3UKSuu
Spatial reasoning from a single image is inherently difficult, but it becomes significantly easier when leveraging a controlled world model, analogous to the mental models used by humans!
Code: https://t.co/Oz1ca9uzWS
See our project webpage, paper, and released code for more details!
Project Page: https://t.co/ZoAGrF5us7
Github: https://t.co/qQ2eMPDMTm
Thanks to all co-authors! @jiagengliu02@zheyuanzhang99@Siyuan_Zhou99 Reuben Tan @jw2yang4ai@du_yilun@gan_chuang
also thanks @MSFTResearch for the support!
#EmbodiedAI #SpatialIntelligence #3DAware #3DVision #AI
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍
MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering.
One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀
🧵1/
🎬 MindJourney in action
Given a spatial reasoning question
1️⃣ Imagine – VLM and world model “walk” the scene iteratively
2️⃣ Observe – the VLM picks up the clues from the tour
3️⃣ Answer – with context, the VLM replies
The imagination loop turns one frame into insight. 💡
🧵5/
VLMs often struggle with physical reasoning tasks such as spatial reasoning.
Excited to share how we can use world models + test-time search to zero-shot improve spatial reasoning in VLMs!
Thanks @_akhaliq for sharing our work!
MindJourney fuses a world model with any VLM, so the model can first imagine walking around before it answers.
From “one snapshot” to “what if I stand over there?”—and suddenly spatial reasoning hits SOTA. 🚀
Project Page: https://t.co/ZoAGrF5us7
You can install anycoder as a Progressive Web App on your device.
Visit https://t.co/esPDyHDu94 and in the footer click settings then follow instructions and click the install button in the URL address bar of your browser
📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI!
👉 https://t.co/qsZ5dBTiq5
🦾Co-organized with an incredible team → @fredahshi · @maojiayuan · @DJiafei · @ManlingLi_ · David Hsu · @Kordjamshidi
🌌 Why Space & SpaVLE?
We never directly “see” space. Instead, we reconstruct it, describe it through language, and navigate its constraints to plan our actions. Let’s bring together communities to tackle spatial intelligence across 2D/3D reasoning, grounded language, and real-world robotic planning.
📝 Call for Papers
• 4-page shorts or 9-page fulls (non-archival)
• Topics: spatial representation, grounding, datasets/benchmarks, foundation models & more.
🗓️ Key Dates
• Deadline: Aug 22
• Notifications: Sep 22
• Camera-ready: Oct 25
Mark your calendars & start drafting! 🚀
🎤 Star-studded keynotes spanning CogSci, NLP, CV, Robotics.
Amir Zadeh · Barbara Landau · Dieter Fox · Joshua Tenenbaum @MITCoCoSci · Joyce Chai @SLED_AI · Ranjay Krishna @ranjaykrishna · Saining Xie @sainingxie. Can’t wait for the insights!
🏆 Best Paper = $3 k cloud credits + Runner-up $1.5 k. Huge thanks to our sponsors: Lambda (@LambdaAPI), Alquist Robotics (@alquistrobotics), and EdenSign (@Edensign_ai). More sponsors welcome! DM us if you’d like to support spatial-AI research.
Join us in San Diego to push the frontiers of spatial understanding and reasoning across CV, NLP, and robotics!
World Simulator, reimagined — now alive with humans, robots, and their vibrant society unfolding in 3D real-world geospatial scenes across the globe!
🚀 One day soon, humans and robots will co-exist in the same world. To prepare, we must address:
1️⃣ How can robots cooperate or compete intelligently?
2️⃣ How do humans build social bonds and communities?
3️⃣ How can both co-exist in an open, dynamic world?
Announcing Virtual Community Project — a social-physical world simulator, where human characters and robotic agents can interact, grow, and co-evolve within open-world societies, stretching from London to New York, and beyond!
Key features include:
✅ Unified multi-agent physics simulations for rich social + physical interactions of humans and robots
✅ Massive auto-generated 3D scenes grounded with the rea-world geospatial data
✅ Agent communities populated by robots and LLM-driven human characters with rich appearances, personalities, and social ties.
🌍 Enter our Virtual Community, an open world to study embodied AI at scale— one social-physical world model at a time!
🔗 Project: https://t.co/SItesNxOvN
💻 Code: https://t.co/clgr6rP7yJ
Paper: https://t.co/VZ67DUchRg
1/n