An exciting and long road lies ahead for Physical AI. I’m so excited to be living in this time. I hope that someday we'll see this robot exploring the solar system and beyond.
We're launching the microagi Research Fellowship.
Fellows get up to $2M in compute, robotics hardware, our evals, and one of the largest physical AI datasets ever assembled. You build in our lab, with our team, alongside partners like Unitree, Nvidia, and Google Cloud.
The hard part of AI left is physical. That's the part we're working on. Come build with us.
One more thing: know someone who belongs here? Reply with their name. If they get in, we send you 10.000 USD
In interesting turn of events, we emerged as finalist for WBCD at ICRA 2026
We’re headed to Vienna from June 1-6 to attend ICRA and work on autonomous policy for whole body control
Who else will be there, let’s meet and catch up!
@leoperzz@autobrik@raulb4s@mbrq_13
I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy homework ;)
And stay till the end, more easter eggs and predictions for your polymarket!
00:30 DGX-1 origin story at OpenAI, I was there in 2016 signing with Jensen and Elon. Heading to the Computer History Museum!
01:42 The Great Parallel
03:31 Robotics, the Endgame
03:39 Why VLAs fall short
04:32 Video world models as the 2nd pretraining paradigm
06:09 World Action Models (WAM)
07:46 Strategies for robot data collection and the FSD equivalent to physical data flywheel for robot manipulation
11:06 EgoScale and the Dexterity Scaling Law we discovered recently
14:00 Physical RL: bridging the last mile
15:39 DreamDojo: an end-to-end neural physics engine for scaling RL in silico
17:00 Civilizational Technology Tree and my predictions for the near future. Spoiler: it's closer than you think.
Thanks to my friends at Sequoia for inviting me back to AI Ascent this year! I had a blast! Last year's talk is attached in the thread if you missed it.
If you are in robotics, you must watch @DrJimFan’s latest Sequoia video. And if not, you should too.
A compelling 20 minute overview of what has happened in robotics over the past three years and a glimpse into where the field is heading next.
Jim Fan’s talks feel to me a bit like Karpathy’s talks but for robotics: clear, synthetic, and giving the bigger picture.
It feels like a summary of a class you have been following for a while, especially if you have been reading many of the papers he mentions. Maybe I am biased because of that, but this one really helped organize the map in my head.
1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities.
I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵
The obvious end state for this path is Chinese body, Chinese brain. I'm actually pretty excited to see what happens this year, it will probably result in some really amazing stuff being built and it seems a lot more useful than a bunch of Chinese Cluelys. Just going by the demos at CES this year compared to last year, the bar is moving up so rapidly and there are so many little details getting figured out.
But yea, if you have any degree of intellectual honesty you can tell that many of the best robotics software demos are coming from China, particularly for full-body control, for the same reason that the best LLMs are from America - modern AI is mostly an infrastructure problem, not a methodological problem. I am quite worried that the future of robotics in the US looks a lot like the current electric car situation, and we're stuck with expensive, worse, "premium-only" options because no one actually really wants to do the hard, boring infrastructure work. I don't have much faith that America will be able to put together any kind of coherent industrial policy to do something different when the people involved are so obviously self-motivated and interested in regulatory capture for the status quo.
The 200+ humanoid startups in China aren't trying to become the next Foxconn and I have no idea why so many smart people in Silicon Valley convinced themselves that this was the case before even talking to any of them. A good example was the Astribot - Pi "partnership". I met the Astribot CEO a few months before that got announced and it was obvious that they were an extremely ambitious full-stack team that had no intention of being the Cursor to Pi's Anthropic unless there was some exclusivity on the table. The brain companies don't have much leverage. Just look at the margins they're paying for hardware. I'm pretty sure American VCs have helped incubate dozens of Chinese companies.
It all just feels kind of depressing, watching from the outside. Shenzhen really seems like Detroit in its heyday and I'm kind of jealous of everyone that has decided to move there in the last year or two.
Anyway, all this is to say that I'm
a Figure stan now and I hope they don't blow up. And Sunday and Bot Co of course. There are several former K-Scale people at Bot Co now and I am very excited for their launch.
Introducing Simile.
Simulating human behavior is one of the most consequential and technically difficult problems of our time.
We raised $100M from Index, Hanabi, A* BCV, @karpathy@drfeifei@adamdangelo@rauchg@scottbelsky among others.
A menudo recibo preguntas y leo comentarios como:
“¿Para qué crear un modelo de lenguaje más?”
La pregunta es válida. El mundo ya tiene modelos potentes, entrenados con billones de parámetros y desplegados a escala global. Entonces, ¿por qué invertir tiempo, talento y recursos en entrenar uno desde América Latina?
Porque se trata de un bien común.
Entrenar un modelo como #LatamGPT significa construir infraestructura estratégica. Un modelo de lenguaje no es solo una aplicación: es una capa base sobre la cual se desarrollan asistentes educativos, herramientas jurídicas, sistemas de apoyo clínico, análisis de política pública y nuevas startups. En la cuarta revolución industrial, es construir infraestructura.
Las grandes infraestructuras que transformaron sociedades, como los sistemas eléctricos de las ciudades, la infraestructura de internet y fibra óptica, las carreteras o las universidades públicas, no fueron pensadas como productos aislados, sino como plataformas compartidas que habilitan productividad, mejoran la economía y promueven movilidad social. La inteligencia artificial comienza a ocupar ese mismo lugar.
Cuando hablamos de #LatamGPT como bien público, no hablamos de un modelo más compitiendo en una tabla de benchmarks. Hablamos de construir una capacidad común para América Latina: una base tecnológica que pueda ser auditada, estudiada, mejorada y utilizada por universidades, startups, gobiernos y estudiantes.
Un modelo de lenguaje no es solo un chatbot. Es una capa fundacional sobre la cual se desarrollan sistemas educativos, asistentes legales, herramientas para política pública, investigación científica y nuevos emprendimientos. Si esa capa es cerrada, la capacidad de adaptación es limitada. Si es compartida, el ecosistema crece.
Además, los datos importan. Los contextos importan. Las variaciones del español y el portugués, así como las referencias históricas, legales y culturales de nuestra región, importan. Un modelo entrenado con mayor representación latinoamericana no es una cuestión identitaria; es una cuestión de utilidad y precisión contextual.
En una región donde el acceso a tecnología avanzada ha sido históricamente desigual, desarrollar infraestructura propia reduce dependencia y amplía el espacio de innovación.
#LatamGPT no es un fin en sí mismo. Es una base común.
Más que preguntarnos “¿para qué uno más?”, quizás la pregunta correcta sea:
¿Queremos seguir siendo usuarios de tecnología, o habilitar una nueva industria para los próximos 20 años?
News: the countdown officially started for the rehearsal of our upcoming @NASAArtemis launch. Teams will fuel the rocket and run through a full range of operations to make sure everything is ready for our crewed launch around the Moon. More on the milestones ahead: https://t.co/LGkankWgfq
Are humanoid robots ready to step into our homes? 🤖🏠
Meet Click-and-Traverse: Navigate through cluttered space like Jackie Chan!
🌟 ONE policy for ALL indoor scenes
🤸♂️ Conquer OMNI-SPATIAL (=ground + lateral + overhead) constraints
🖱️ Easy teleoperation: Simply click a goal, and the robot smoothly traverses towards it
Check it out now! 👉 Project: https://t.co/BJaBFdh7z7
🚀 Code: https://t.co/cwcIHp0rYl
Here's a short video from our founder, Zhilin Yang.
(It's his first time speaking on camera like this, and he really wanted to share Kimi K2.5 with you!)