Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES).
By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning.
Paper: https://t.co/Es44ZqfcJ6
Code: https://t.co/eduztHwrLS
My brain broke when I read this paper.
A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2.
It's called Tiny Recursive Model (TRM) from Samsung.
How can a model 10,000x smaller be smarter?
Here's how it works:
1. Draft an Initial Answer: Unlike an LLM that writes word-by-word, TRM first generates a quick, complete "draft" of the solution. Think of this as its first rough guess.
2. Create a "Scratchpad": It then creates a separate space for its internal thoughts, a latent reasoning "scratchpad." This is where the real magic happens.
3. Intensely Self-Critique: The model enters an intense inner loop. It compares its draft answer to the original problem and refines its reasoning on the scratchpad over and over (6 times in a row), asking itself, "Does my logic hold up? Where are the errors?"
4. Revise the Answer: After this focused "thinking," it uses the improved logic from its scratchpad to create a brand new, much better draft of the final answer.
5. Repeat until Confident: The entire process, draft, think, revise, is repeated up to 16 times. Each cycle pushes the model closer to a correct, logically sound solution.
Why this matters:
Business Leaders: This is what algorithmic advantage looks like. While competitors are paying massive inference costs for brute-force scale, a smarter, more efficient model can deliver superior performance for a tiny fraction of the cost.
Researchers: This is a major validation for neuro-symbolic ideas. The model's ability to recursively "think" before "acting" demonstrates that architecture, not just scale, can be a primary driver of reasoning ability.
Practitioners: SOTA reasoning is no longer gated behind billion-dollar GPU clusters. This paper provides a highly efficient, parameter-light blueprint for building specialized reasoners that can run anywhere.
This isn't just scaling down; it's a completely different, more deliberate way of solving problems.
My 17 year old nephew saved my business 1000s of hours a year by showing me the tools he’s using to lazy/cheat his way through school (like @gammaapp).
Meanwhile friends complain how Bain are charging $Ms to learn AI on their dime with ‘pilots’ that never ship.
Go figure.
@KatPaton13 Lots of people telling you don’t be too hasty etc… this was a non-issue for me because I KNEW I would never go back. If you’re in the same state of mind and you know what’s next - no need to delay. If not then agree finish up and then test waters as you train in parallel.
@abhi_agarwal4@Accel@fabrichq_ai Anyway you seem like a nice person and a strong founder. I’m sure you will raise and build a great start-up. Was just sharing my two cents on approach. Put me in the ‘I’ll prove him wrong’ bucket. I genuinely wish you the best of luck - we need strong founders like you building.
@abhi_agarwal4@Accel@fabrichq_ai As someone who might be DDing you for a fund I now know you’ve been rejected for a prestigious investment. That on the surface may mean nothing but it’s a data point I have that doesn’t exactly go in your favour…
@abhi_agarwal4@Accel@fabrichq_ai You shared the fact that you specifically didn’t get funded by them. Each to your own but if I was DDing you I wouldn’t take the risk. I’m all for building in public but this achieved very little other than antagonising those who rejected you
@abhi_agarwal4@Accel@fabrichq_ai Publicly sharing private VC communications is a bafflingly bad decision. Understand having conviction but this is grandstanding that doesn’t help you build better product or build trust from potential other investors who have seen you share private comms without permission.
I wrote an article about DAOs, why they're broken and how to fix them, (DAOs-2.0) but its a 17 min read 😅. Do people read stuff that long on Medium? Whats the best place for me to publish? I thought about releasing an Arxiv paper but I dont think thats where this belongs...
My thoughts on how founders and investors should be thinking about how they can make impact in a post LLM world. In short: Focus on platform functionality and declarative UX.
https://t.co/BOl86iS9mt
Happy Friday Everyone. #TrackOfTheWeek is now live for its third week. Head to our Cub3 community and have your say as to who should win.
https://t.co/zy9hHTeCFS
#ProofOfBehavior