If you give a frontier model the complete ruleset for a strategy game, can it derive a winning strategy from first principles?
I wanted to test @claudeai Sonnet 4.6's ability to play the 2009 strategy game Small World. Three identical instances with the same instructions and compute budget played against each other. The games surfaced a reasoning pattern around action bias and locality that I think applies broadly to long-horizon software engineering and knowledge work beyond just strategy games.
Full blogpost: https://t.co/drpMM9p6NG
One interesting point: a fixed KV cache is a MLP.
Collectively, the keys form an up projection, and values form a down projection. The softmax is a nonlinearity.
Therefore, we can view KV compression as a new way of producing ‘weights’. Instead of using back propagation to refine our MLPs, we can learn to produce them directly from context. This is perhaps more analogous to human learning and has the potential to be far more sample efficient.
So basically this trades 8 separate KV caches and decode latency for param efficiency. And two 16 layer transformers loop over each other (L for 3x, H for 1x, repeat 2 cycles) before decode. Curious how this scales
Introducing HRM-Text.
An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure.
Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models.
The kicker? The full model trains in roughly one day on a $1,000 budget.
This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game.
Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way.
We share our approach, early results, and a quick look at our model in action.
https://t.co/AFJZ5kH7Ku
Given two branches with concrete projections, it reliably picks the better one. Its weakness is option generation: left to its own devices, it generates one option (the action-forward one) and never surfaces the alternative. The template's entire contribution is making option generation mandatory, which turns out to be enough to close most of the gap.
Full post with much more details: https://t.co/drpMM9p6NG
Link to code: https://t.co/YtpReieMgH
If you give a frontier model the complete ruleset for a strategy game, can it derive a winning strategy from first principles?
I wanted to test @claudeai Sonnet 4.6's ability to play the 2009 strategy game Small World. Three identical instances with the same instructions and compute budget played against each other. The games surfaced a reasoning pattern around action bias and locality that I think applies broadly to long-horizon software engineering and knowledge work beyond just strategy games.
Full blogpost: https://t.co/drpMM9p6NG
The general finding is about what I'm calling strategic attention. It is the reflex to pull the right reasoning framework into active context at the right moment. The model has the knowledge: if you ask it "when should you decline in Small World?" it gives a correct answer. It just doesn't activate that knowledge unprompted at the decision point. The template interrupts the default action-first reasoning loop long enough for the model's own strategic thinking to engage.
This maps directly to a pattern @FrontierSWE found in software engineering: Opus 4.6 solved a Pyright optimization in 11 minutes, then kept iterating for seven more hours across 95 builds, at one point losing the fix entirely before rediscovering it. If it had stopped at minute 11, it would have scored the same.
In one episode, the model needed to edit files it lacked permissions for. After searching for workarounds, it found a way to inject code into a config file that would run with elevated privileges, and designed the exploit to delete itself after running.(4/14)
An underrated aspect of language models is practically zero skill degradation over time (without inference time quantization and assuming stable compute). We have to actively practice a skill just to stay on the capability frontier.
The team at @cursor_ai posed the problem of character prefix conditioning at the beginning of the year - today I'm releasing a short blog post and some code walking through my attempt. It was fun to learn some creative ways of sampling from language models.