🤖What if robots could discover diverse abilities—all without simulation nor extensive human tuning?
With @lisa_coiffard, Oscar Pang, @maxencefaldor and @CULLYAntoine, we introduce URSA: an efficient skill discovery algorithm applicable directly on real hardware.
#CoRL2025 🧵
Today, we are launching our first commercial product: Sakana Marlin, Your Virtual CSO. Marlin is an autonomous research assistant for business, built around hours of long-horizon reasoning.
Try Marlin: https://t.co/ZWsYW91dQW
Blog: https://t.co/SdYFwswOQH
You provide a research topic, and that is the only input required. From there, Sakana Marlin works autonomously for up to roughly 8 hours. It forms hypotheses, gathers information, and verifies its own findings as it works through a vast body of material. It returns a structured set of summary slides and a research report dozens of pages long. It is designed to take on the kind of deep strategy work that a CSO and a small team might otherwise spend weeks on.
Unlike instant chat or general-purpose deep research, Sakana Marlin executes a long reasoning process that unfolds over 8 hours. Underpinning it is the research we have pursued over the past two years into long-horizon reasoning and AB-MCTS, our method for coordinating multiple models to reason more effectively together. But Sakana Marlin did not come from the lab alone. It grew out of our work deploying AI agents across real industries in Japan, making it the direct product of what we have learned in both research and the field.
Marlin is available today, offering a pay-per-use tier with no monthly fees, alongside Pro, Team, and Enterprise plans.
It is also the first of many products to come from Sakana AI, stay tuned!
Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
https://t.co/c9AvsRKybj
What if we didn’t have to hold an entire neural network in memory to train it?
Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network.
In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance.
With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block.
How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently.
We validated this across five different architectures:
• ViT
• DiT
• Masked diffusion
• Autoregressive transformers
• Recurrent-depth transformers
In each case, performance is competitive with end-to-end training while using a fraction of the memory.
This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training.
Read our paper and code, to learn more.
Paper: https://t.co/CRj96VGYQn
GitHub: https://t.co/eNW0K9Xh8E
🐟
The full Transformer vs Post-Transformer debate is live.
80 minutes. Seven rounds. No slides. Real disagreement.
@lukaszkaiser came to defend the Transformer. @adrian_pathway, @YesThisIsLion, and @mlech26l made the case for what comes next.
00:00 Contenders enter the ring
06:30 Lukasz Kaiser defends the Transformer
10:08 Adrian Kosowski on BDH and the PageRank Moment for AI
17:35 Llion Jones: Why Transformers aren't the final architecture
29:50 Mathias Lechner on Liquid AI’s approach, Fast Weights, and Self-Replacing AI
40:28 Reasoning Beyond Language
44:15 Scaling Laws: Transformer vs Post Transformer
50:31 Benchmarks, Coding Models, and Perplexity
1:04:00 Continual Learning and Dynamic Weights
This is the ultimate source of truth on the subject.
Excited to show our latest work on resilient robotics! 🚀🤖 Our new Nature Communications paper introduces FLAIR, a method that learns online, on-device, in 225ms, how to compensate for unseen perturbations affecting a robot.
📄https://t.co/p2osVEQsjM
🎥 https://t.co/ozX2p10vSM
We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢
Blog https://t.co/arVz1TGpJJ
Paper https://t.co/0EwpyRXeCs
Can a speech AI think deeply without pausing to process?
In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds.
Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak."
In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese).
A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time.
This shifts the AI paradigm from "think, then speak" to "speak while thinking."
The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions.
Try the model yourself here: https://t.co/uDA0nvvjhS
Happy to share our paper "Getting robots back on track by reconstituting control in unexpected situations with online learning" is now out in Nature Communications.
Joint work with @allardmaxime079, @bryanlimwt, and @CULLYAntoine.
Super happy to share that our work on robot adaptation is now out in @NatureComms!
Imagine a robot being paralysed in some areas from unknown failures like a faulty or damaged wheel or motor, operating and driving such a vehicle is near impossible.
🧵1/3
🤖Thrilled to share that robotics work from my PhD is out in @NatureComms 🎉
"Getting robots back on track by reconstituting control in unexpected situations with online learning"
With @MFlageat , @bryanlimwt , @CULLYAntoine@imperialcollege
🧵below
Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026
https://t.co/Wnh9ZACmLm
What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs?
To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team.
We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026).
Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies:
1/ Which agent to call
2/ What specific subtask to give them (acting as an expert prompt engineer)
3/ What previous messages they can see in their context window
Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems.
The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost.
One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference.
This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence.
Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (https://t.co/36Ud311KCP) 🐡
OpenReview: https://t.co/e5WqTleQNL (ICLR 2026)
What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt?
Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we're releasing Digital Ecosystems: a browser-based platform for interactive artificial life research.
The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs.
What we didn't expect was the role of the learning itself. Gradient descent isn't just optimising each species' strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact.
The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them.
Everything runs client-side in your browser, no install needed.
Blog: https://t.co/qOuelxmd6l
Code: https://t.co/pz7ktDCRZS
Following our recent defense announcements, our team just completed a major project with Japan’s Ministry of Internal Affairs and Communications (@MIC_JAPAN). 🇯🇵
We built an end-to-end intelligence system to visualize and counter disinformation on social media.
Blog (Japanese): https://t.co/RkqVaMax6z
Tackling disinformation at a national scale is incredibly complex. It requires understanding shifting social narratives, not just flagging individual posts. To do this, our team deployed autonomous AI agents running novelty searches to uncover hidden narratives. To catch sophisticated disinformation strategies, they combined frontier foundation models with our proprietary small models to cover each other’s blind spots. We adapted our Shachi simulation framework (https://t.co/nHSe8wFF9i) to model how counter messaging spreads across different network topologies before deployment.
This is another milestone for @SakanaAILabs’ Defense and Intelligence team, as we build critical infrastructure to help strengthen Japan.
本日、クララ・シャパ仏AI担当大使が Sakana AI に来訪され、Sakana AI と仏 Current AI の間のMoUにフランス側を代表して署名いただきました。🇯🇵🇫🇷
本MoUは、AIスタックや、グローバルサウスへの貢献を含む国際的なAI分野での協力を内容としています。今後も、ソブリンAIのエコシステム確立に向けて フランスを含む国際的なパートナー国企業と連携してまいります。