🧵 Deli AutoResearch SKILL is now officially open source! 🎉
https://t.co/V3lwwdyQm8
Alongside it, we’re dropping our 4th survey paper — this time on Self-play.
https://t.co/SEb2qoKCI6
Inspired by AlphaZero, we got a powerful insight: prior knowledge doesn’t always lift the ceiling.
Models can discover more globally optimal solutions just by playing against themselves.
The biggest change in this paper?
For the first time, the AutoResearch Agent autonomously planned GPU experiments — and submitted actual RL runs on the DeepSeek 285B model.
The entire RL pipeline — experiment design, code writing, running, debugging, and conclusion summarization — was 100% automated, with zero human intervention from me.
This was incredibly difficult, but an incredibly important step.
https://t.co/kuZZNux5RH
GRPO is the tool being called by the AutoResearch Agent here.
We see this as the beginning of our Continual Learning research journey. 🚀
As always, this is my personal research project, unaffiliated with any organization. All views are my own.
#AI #ReinforcementLearning #SelfPlay #OpenSource #AutoML #ContinualLearning #DeepSeek
An inspiring work on Physical AI: PhysX-Omni.
It introduces the first unified sim-ready generation framework for rigid, deformable, and articulated objects, along with a diverse dataset and new benchmark.
- Page: https://t.co/qBd1BQJ3Bc
- Code: https://t.co/GhKg4oqGgn
- Dataset: https://t.co/OURCiYIDEQ
this AI workflow just ended interior design 🔥
what used to cost $100,000 and months of work now costs pennies and few minutes.
here’s exactly how to do it:👇
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way.
We share our approach, early results, and a quick look at our model in action.
https://t.co/AFJZ5kH7Ku
New Anthropic research: Natural Language Autoencoders.
Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read.
Here, we train Claude to translate its activations into human-readable text.
New Anthropic research: Emotion concepts and their function in a large language model.
All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
New on the Anthropic Engineering Blog:
How we use a multi-agent harness to push Claude further in frontend design and long-running autonomous software engineering.
Read more: https://t.co/HWvmXk1ykn
Today Ropedia releases Xperience-10M at #GTC day 1 — World largest real human 4D interaction dataset at 10M scale.
Each trajectory aligns:
• visual observations
• spatial structure
• human motion
• interaction dynamics
• task semantics
A new foundation for physical and spatial AI, try it out @huggingface https://t.co/m28msY8JzY
Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe.
We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world.
We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one.
Read more: https://t.co/kyVAL7EoFx
AMI - Real world. Real intelligence.
Excited to show some surprising inventions on generative multiplayer games we made at Google with Stanford. We call the work MultiGen.
I've always been inspired by early studios like id Software with Doom or Blizzard with Warcraft bringing networked video games to the next level. We are at the point in history where we can make strides like them, but for generative games. It's a strange feeling to be in the age of generative video games while still discovering how exactly to train the models and design the tools that make them useful.
All of the tools that have been invented for classic game engines need to be redesigned for generative games. For example level and world design is not entirely possible with existing technology. We introduce editable memory to diffusion game engines that allow for design of new levels via a minimap. But we can easily imagine how this can be expanded with different creation tools. The end goal of this research direction is to allow game designers to be able to guide the generation process of their world, at the granularity that they prefer.
Editable memory also allows us to add multiplayer to Generative Doom. We were amazed when we saw GameNGen some years ago, and now you can play it live with friends in real-time, on your couch or even online.
Shared representations like our editable memory seem like the future for this type of experience. Models are, in some cases, expensive and approximate encoders but great interpolators and extrapolators. Leveraging their strengths lets you have completely new experiences that can be realized now and not in the distant future.
This work was started at my previous team and continued in collaboration with Stanford. Congratulations to all for the discoveries.
world modeling is never about rendering pixels.
rendering is local. world state is global. as soon as more than one agent exists, the only thing that truly matters is the shared representation beneath individual views. that shared representation is what scales into collective capability.
this is why I'm super excited to share project Solaris -- our new work focused on building a multiplayer video world model in minecraft.
This release includes three main pieces.
1⃣Solaris Engine, a fully featured multiplayer data collection system with built in visuals. the team put a huge amount of work into this since nothing like it really exists yet.
https://t.co/dw9lTmr9Pk
2⃣Solaris Model, a multiplayer DiT with a new memory efficient self forcing design, trained on 12.6M frames of coordinated Minecraft gameplay.
https://t.co/cVjlGKcNUf
3⃣Solaris Eval, which uses a VLM as a judge to evaluate different multiplayer capabilities.
read the full technical breakdown by @ojmichel4, and start building with Solaris.
https://t.co/1NkdqSRZy5
Introducing a world built by the Moonlake's world model. 🏙️
Most world models only allow for a limited action space.
Moonlake maintains multimodal states across physics, appearance, geometry, and casual effects and predict how they evolve under different actions. 👇