2/ A core issue with parameter-only RL is that it forces task-specific learning into the model weights. Traditional RL can improve model performance on the current task, but it also tends to shift behavior away from the base model, increase forgetting and reduce plasticity. On the other hand, prompt optimization alone has the opposite limitation, as it is fast and cheap, but usually not enough to match the gains from weight updates.
The paper introduces Fast-Slow Training (FST). FST splits adaptation into two co-evolving channels:
Slow weights (θ): the model parameters, updated by RL Fast weights (Φ): a population of prompts, evolved by GEPA
In FST, context is updated from rich textual feedback, while RL updates the model more gradually. Each round interleaves a GEPA reflection cycle — a reflection model rewrites prompts from failure traces — with a few RL steps sampled across that prompt population. Both channels optimize the same reward, concurrently. No parameter freeze, no sequential hand-off.
This lets task-specific lessons move quickly through the fast channel, while preserving more of the base model’s general behavior in the slow channel.
4/ This reframes post-training. The default view treats adaptation as one channel — push every improvement into the weights — and pays for it with forgetting, eroded generality, and lost plasticity. FST splits that into two channels that co-evolve: task-specific nuance lives in fast weights (prompts), durable capability in slow weights (parameters). And it's a blueprint, not a single algorithm.
At Eragon, we are interested in AI systems that keep getting better at new things without getting worse at everything else.
If you are an engineer working in Applied AI and/or Machine Learning, join Eragon to build the future underlying layer of the next form of human organization.
https://t.co/FgLpKu8cCZ
3/ FST beats RL-only across four axes:
- Data efficiency: FST reaches RL's running peak in substantially fewer optimizer steps — 3.0× fewer on CodeIO, 1.4× on Math (Polaris), and 3.0× on HoVer-hard — and continuing past the crossover, FST's running peak also exceeds RL's on all three tasks.
- Higher performance asymptote: FST scores higher than RL across all three performance asymptote: +4.4pp on CodeIO, +2.9pp on math, +7.7pp on HoVer-hard
- Preserved plasticity: at matched reward, FST models have up to 70% lower KL to the base policy than RL-only baselines. Starting from a Math or Physics checkpoint trained with either method, a fresh RL pass on HoVer-hard over 400 steps, while FST-init preserves more capacity for the new task than RL-init on both arms, and on the Math arm prior RL collapses HoVer-hard learnability to near-zero.
- Continual learning: in a 3-task stream, FST gained ~20pp in a stage where RL gained ~2.5pp (~8× the acquisition rate)
How FST Works: To leverage the strong in-context learning of current LLMs, we treat the context as "fast weights" and model parameters as "slow weights", drawing from a rich literature in classic ML
Announcing Fast-Slow Training (FST) pairing "slow" weights with "fast" context.
We try to answer the question, can LLMs adapt continually without losing base skills?
FST vs RL:
- 3x more sample-efficient
-Higher performance ceiling
- Less KL drift
- Continual learning: succeeds where RL stalls
Excited to share our first research paper Learning, Fast and Slow: Towards LLMs That Adapt Continually.
Fast-Slow Training (FST) combines optimized context with model weight updates.
Read more here: https://t.co/E7CoNGp7Rz
1/ At Eragon, we’re building an AI operating system that connects a company’s entire tech stack into a single interface for work, powered by a model post-trained on the customer’s own data so it understands the company’s unique context.
We believe that AI system post-training shouldn’t have to choose between adapting quickly and learning durably: the future of adaptive AI is fast learning + slow learning:
- fast enough to absorb task-specific lessons
- slow enough to improve without forgetting
Our recent research paper: Learning, Fast and Slow, makes that case.
https://t.co/RG4wKWUk6i
4/ This reframes post-training. The default view treats adaptation as one channel — push every improvement into the weights — and pays for it with forgetting, eroded generality, and lost plasticity. FST splits that into two channels that co-evolve: task-specific nuance lives in fast weights (prompts), durable capability in slow weights (parameters). And it's a blueprint, not a single algorithm.
At Eragon, we are interested in AI systems that keep getting better at new things without getting worse at everything else.
If you are an engineer working in Applied AI and Machine Learning, join Eragon to build the future underlying layer of the next form of human organization.
https://t.co/FgLpKu8Ksx
3/ FST beats RL-only across four axes:
- Data efficiency: FST reaches RL's running peak in substantially fewer optimizer steps — 3.0× fewer on CodeIO, 1.4× on Math (Polaris), and 3.0× on HoVer-hard — and continuing past the crossover, FST's running peak also exceeds RL's on all three tasks.
- Higher performance asymptote: FST scores higher than RL across all three performance asymptote: +4.4pp on CodeIO, +2.9pp on math, +7.7pp on HoVer-hard
- Preserved plasticity: at matched reward, FST models have up to 70% lower KL to the base policy than RL-only baselines. Starting from a Math or Physics checkpoint trained with either method, a fresh RL pass on HoVer-hard over 400 steps, while FST-init preserves more capacity for the new task than RL-init on both arms, and on the Math arm prior RL collapses HoVer-hard learnability to near-zero.
- Continual learning: in a 3-task stream, FST gained ~20pp in a stage where RL gained ~2.5pp (~8× the acquisition rate)