Claude Code can now build things in a simulated physical world!🤖🏙️
With SimWorld, coding agents can construct buildings, plan cities, or even create video games inside a realistic simulation on Unreal Engine.
Just write a prompt, your agent will call tools, retrieve assets, plan scenes, and test physics autonomously.
Demo platform coming soon so everyone can try it. Stay tuned. 🚀
I’m not sure Gemini 3 looks that much more impressive here.🤔
For example, why is there a giant White House–like building just sitting in the middle of the street?
This feels like a real example of how even frontier coding agents can still struggle with spatial reasoning.
We asked 4 frontier coding agents to build the same Unreal 3D city scene in SimWorld Studio.
Same prompt. Different worlds 👀
Claude Code + Opus 4.7
Codex + GPT-5.5
Cursor + Composer 2.5
OpenCode + Gemini 2.5 Pro
Who wins?
Environment generation is the missing scaling axis for embodied AI.
Introducing SimWorld Studio: a self-evolving factory for endless interactive 3D env where agents act, fail & learn.
Env-agent co-evolvution improves navigation success 50% → 90%.
From a prompt, our SimCoder writes code to automatically build an interactive world. Agents train inside it. And their performance shapes the next world.
Environment generation is the missing scaling axis for embodied AI.
Introducing SimWorld Studio: a self-evolving factory for endless interactive 3D env where agents act, fail & learn.
Env-agent co-evolvution improves navigation success 50% → 90%.
From a prompt, our SimCoder writes code to automatically build an interactive world. Agents train inside it. And their performance shapes the next world.
Natural language is human-created representation of the world.
Is the ultimate form of the bitter lesson to bypass natural language entirely and learn a new representation from the world itself?
Happy to release NanoRollout, our infra attempt to scale digital agent rollouts without pain. Setting up and scaling parallel digital agent envs is one of the biggest headaches in agent training / deployment. The open community hasn't handled it well.
Two appealing features from NanoRollout:
🔌 Non-intrusive RL integration with frameworks such as miles, verl, tunix; validated end-to-end, e.g. outperforms DeepSWE-32B at a large 4k batch size 🚀
🧩 Unified support across agent harnesses and envs — covering SWE-Bench, Terminal-Bench, OSWorld, CocoaBench — with fast parallel eval that reproduces published scores (e.g., full SWE-Bench Verified eval from 102 min → 18 min, 5.7x faster⚡)
And the core logic is just ~900 LOC.
Hope NanoRollout helps if you're also trying to scale agent rollouts. Check out the tech blog and github for more details!
Big thanks to the fantastic co-lead @JunliWang2021
🏆Honored to receive the Test of Time Award Honorable Mention #AISTATS2026 for our 2016 work Deep Kernel Learning, with the amazing @andrewgwils@rsalakhu@ericxing
What a decade of AI progress! While GenAI is now driving massive real-world applications, the deepest underlying challenge remains: learning efficient representations of the world—for understanding, generation, predicting future worlds, and reasoning in the latent space.
So much fun to think about for the next decade!⏳
Come check out LaDiR — our ICLR paper about latent diffusion for text reasoning.
Instead of reasoning one token at a time in text space,
LaDiR moves reasoning into continuous latent space
and uses diffusion over blocks of thought tokens.
That means LLMs can:
-rethink whole reasoning paths
-explore multiple solutions
-and plan more flexibly
We show these gains on math, code and planning tasks.
Come and check out our ICLR work: Speculative Verdict (SV) for information-intensive visual reasoning.
Inspired by speculative decoding, instead of drafting tokens, SV asks multiple small VLMs to draft diverse reasoning and localization paths, then uses a stronger model to produce the final verdict.
The key insight is simple: no single reasoning path has to be perfect.
Even when each path is only partly correct, combining the right pieces can still recover the correct answer — giving both better accuracy and lower cost.
🍫 CocoaBench v1.0 is out!
CocoaBench is a benchmark for unified digital agents, built around open-world tasks that require composing 💻 coding, 👀 vision, 🌐 search.
Since our first research preview last December, we have expanded the benchmark substantially with community contributed tasks, and spent months testing and refining the tasks, evaluations, and agent runs.
Some takeaways:
• Even the best agent system reaches only 45.1% on CocoaBench v1.0.
• Coding agents like Codex are already surprisingly strong on general tasks beyond software engineering.
• Stronger agents tend to push more of the work into code.
• Open source models still lag behind leading frontier models on these general tasks.
👇More on the website and in the paper
#AI #Agents #LLM #Benchmark #CocoaBench
That’s wild — and smart! 🤣
SimWorld coding agent self-improves by autonomously creating new tools and skills
It realized BaGuaZhen(八卦阵) was too hard to build directly, so it created its own tools and skills.
Starting from only primitive operations like spawn_actor() and delete_actor(), the agent does not just brute-force the task.
It breaks the problem down and builds higher-level capabilities for itself.
A SimWorld coding agent can now create its own tools and skills on the fly.
We challenged it with BaGuaZhen (八卦阵 Eight Trigrams), an ancient Chinese formation that is difficult to build from scratch because of its precise spatial structure and multi-step coordination.
Instead of failing with brute force, the agent wrote reusable components for itself:
Tools: Bagua Wall Segment, Bagua Trigram Line
Skills: Bagua Wall Segment Skill, Bagua Trigram Line Skill
Each tool is paired with a skill that teaches the model how to use it.
Without skills: it fails.
With self-built skills: it organizes the full structure.
The exciting shift is this:
agents are starting to generate capabilities, not just outputs.
It’s fun to watch a coding agent reason through spatial construction, iterating through trying, failing, revising, and trying again. Really promising, though still a long way to go.
It reminds me of a kid playing with LEGO for the first time, gradually turning trial and error into something creative, like a piece of art.
Try SimWorld Studio to build your own physical world.
🚨New Release: SimWorld Studio — Vibe Code the Physical World
Today we open source SimWorld Studio, a coding-agent platform for building interactive physical worlds.
Just chat with Claude Code to create environments, place assets, test physics, and edit everything live.
Build worlds as easily as just writing prompt.
🚨New Release: SimWorld Studio — Vibe Code the Physical World
Today we open source SimWorld Studio, a coding-agent platform for building interactive physical worlds.
Just chat with Claude Code to create environments, place assets, test physics, and edit everything live.
Build worlds as easily as just writing prompt.
🤖Coding agents like Claude Code are already game changers for digital tasks in 2026.
But what if they could write code to build physical worlds? 🏙️
Imagine going from a single line of prompt → a controllable, interactive simulated world.
Such environments could open new frontiers for game creation, RL training, large-scale world simulation, and studying complex social reasoning.
Our SimWorld agent coding team is working toward releasing a platform that lets anyone build their own virtual worlds. Stay tuned.
Claude Code can now build things in a simulated physical world!🤖🏙️
With SimWorld, coding agents can construct buildings, plan cities, or even create video games inside a realistic simulation on Unreal Engine.
Just write a prompt, your agent will call tools, retrieve assets, plan scenes, and test physics autonomously.
Demo platform coming soon so everyone can try it. Stay tuned. 🚀