Introducing SOAR 🚀, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!)
It brings LLMs from just a few percent on ARC-AGI-1 up to 52%
We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code.
🧵
Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use - happy building!
🚀 Introducing the Qwen 3.5 Small Model Series
Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B
✨ More intelligence, less compute.
These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL:
• 0.8B / 2B → tiny, fast, great for edge device
• 4B → a surprisingly strong multimodal base for lightweight agents
• 9B → compact, but already closing the gap with much larger models
And yes — we’re also releasing the Base models as well.
We hope this better supports research, experimentation, and real-world industrial innovation.
Hugging Face: https://t.co/wFMdX5pDjU
ModelScope: https://t.co/9NGXcIdCWI
ARC Prize 2025 Winners Interviews
Paper Award 2nd Place
@PourcelJulien, @cedcolas, @pyoudeyer discuss SOAR - a self-improving evolutionary program synthesis framework that fine-tunes an LLM on its own search traces - without human-engineered DSLs or solution datasets.
Our self-improving genetic algorithm received the 2nd place paper award for the @arcprize!
Congrats in particular to @PourcelJulien the experiments wizard!
We proposed a simple, general algorithm ⬇️
Congrats to the ARC Prize 2025 winners!
The Grand Prize remains unclaimed, but nevertheless 2025 saw remarkable progress on LLM-driven refinement loops, both with "local" models and with commercial frontier models.
We also saw the rise of zero-pretraining DL approaches like HRM and TRM. Lots of new learnings!
ARC Prize 2025 Paper Award Winners
1st / "Less is More: Recursive Reasoning with Tiny Networks" (TRM) / A. Jolicoeur-Martineau / $50k
2nd / "Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI" (SOAR) / J. Pourcel et al. / $20k
3rd / "ARC-AGI Without Pretraining" / I. Liao et al. / $5k
ARC Prize 2025 competition concluded today - The year of Refinements
Our goal is to bring meaningful open source research into the community and today we awarded $137K to 14 teams
Benchmarks matter, but their true value comes from the progress they catalyze
ARC Prize 2025 was designed to inspire the community to publish research aimed at building more generalized systems
The grand prize remains unclaimed, but the leaderboard reflects strong advances, and all submissions and solutions are now open sourced.
Here is a recap of the winners, for more, checkout the great recap by @mikeknoop (link below)
** Paper Prizes **
1/ Alexia Jolicoeur-Martineau (@jm_alexia) - TRM
Tiny Recursive Model (TRM) is a tiny 2-layer network that does recursive reasoning: it keeps a latent state z and a current answer y, repeatedly updates z using the puzzle and y, then refines y from z over many “deep supervision” steps, so it can gradually fix its own mistakes without needing a huge model. It simplifies Hierarchical Reasoning Model (HRM).
2/ Pourcel julien (@PourcelJulien) - Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
SOAR is a self-improving evolutionary program synthesis system: it uses an LLM to sample and refine Python programs for ARC tasks (Sample & Refine phase), then turns all those attempts-both successes and failures-into new problem–solution pairs via hindsight relabeling, and fine-tunes the same LLM so it gets better at both sampling and refinement next time.
3/ ARC-AGI Without Pretraining - Isaac Liao (@LiaoIsaac91893)
CompressARC shows that lossless information compression alone can produce intelligent behavior on ARC-AGI: for each puzzle, it builds a randomly initialized neural network and uses gradient descent at inference time to find a compact representation (like a VAE-style loss: cross-entropy + KL) that best “compresses” all the given example grids.
** Top Scores **
1/ NVARC (@JFPuget, Ivan Sorokin)
The NVIDIA team built a huge synthetic dataset of ARC-AGI puzzles, then turned those summaries into Python programs that produce consistent input/output grid pairs. Used test-time fine-tuning (TTFT) plus a fast Depth-First Search decoding process to adapt each model to the hidden test puzzles.
2/ the ARChitects (@dvhrtm, Daniel Franzen, @JDisselh)
The ARChitects fine-tune a LLM on ARC-style grids and then use it at test time in two roles: 1) As a generator that, via depth-first search (DFS) over token probabilities, systematically explores the space of high-probability candidate solutions (not just random samples), 2) Second as a scorer that evaluates how likely each complete solution is.
3/ MindsAI @ Tufa Labs (@MindsAI_Jack, @DriesSmit1, @MohamedOsmanML, @bayesilicon)
Trained a trimmed CodeT5 encoder–decoder model for years on the massive ARC-AGI Mega dataset (100M+ examples) using span corruption, reversals, and BPE dropout so it learned structure, not surface patterns. At inference, they ran large-scale test-time training (TTT) on thousands of permuted and augmented versions of the test set, then applied AIRV.
4/ Lonnie
Lonnie reused the 2024 ARChitects pipeline but treated the random seed as a hyperparameter, systematically exploring seeds to exploit variance on the small 240-task evaluation set, which pushed an otherwise baseline-style system up to 5th place on the private leaderboard.
5/ Guillermo Barbadillo @ Veridas (@guille_bar)
Guillermo believes that ARC will ultimately be solved by a search-and-learn approach that combines program synthesis with test-time training (TTT) and hindsight relabeling, so the system can search over code, learn from failed attempts, and steadily refine its solutions.
We're going bigger in 2026! Let' go!!
Announcing the ARC Prize 2025 Top Score & Paper Award winners
The Grand Prize remains unclaimed
Our analysis on AGI progress marking 2025 the year of the refinement loop
In San Diego for #NeurIPS
Happy to chat about open-endedness, self goal-generation, intrinsic motivations, self-improvement, human-machine collective intelligence
Open to hear about research scientist opportunities too
Don't hesitate to reach out!
Big news: I’m officially a 2025 Google PhD Fellow! 🎓✨
I’m also heading to #NeurIPS2025 in SD! Happy to chat about LLM, code gen, evolutionary algo, open-endedness, self-improvement, enhancing LLM diversity, ARC-AGI, and other subjects.
Open to hear about summer internship. ☀️
A huge thanks to @cedcolas@lae_teo@pyoudeyer for their guidance on my application.
If you are interested in solving ARC-AGI with evolutionary algorithms (SOAR) or generating diverse coding problems (ACES), check out my work here: https://t.co/O8xsBvNuBX