Eragon @EragonAI - Twitter Profile

3 days ago

2/ A core issue with parameter-only RL is that it forces task-specific learning into the model weights. Traditional RL can improve model performance on the current task, but it also tends to shift behavior away from the base model, increase forgetting and reduce plasticity. On the other hand, prompt optimization alone has the opposite limitation, as it is fast and cheap, but usually not enough to match the gains from weight updates. The paper introduces Fast-Slow Training (FST). FST splits adaptation into two co-evolving channels: Slow weights (θ): the model parameters, updated by RL Fast weights (Φ): a population of prompts, evolved by GEPA In FST, context is updated from rich textual feedback, while RL updates the model more gradually. Each round interleaves a GEPA reflection cycle — a reflection model rewrites prompts from failure traces — with a few RL steps sampled across that prompt population. Both channels optimize the same reward, concurrently. No parameter freeze, no sequential hand-off. This lets task-specific lessons move quickly through the fast channel, while preserving more of the base model’s general behavior in the slow channel.

EragonAI's tweet photo. 2/ A core issue with parameter-only RL is that it forces task-specific learning into the model weights. Traditional RL can improve model performance on the current task, but it also tends to shift behavior away from the base model, increase forgetting and reduce plasticity. On the other hand, prompt optimization alone has the opposite limitation, as it is fast and cheap, but usually not enough to match the gains from weight updates.

The paper introduces Fast-Slow Training (FST). FST splits adaptation into two co-evolving channels:

Slow weights (θ): the model parameters, updated by RL Fast weights (Φ): a population of prompts, evolved by GEPA

In FST, context is updated from rich textual feedback, while RL updates the model more gradually. Each round interleaves a GEPA reflection cycle — a reflection model rewrites prompts from failure traces — with a few RL steps sampled across that prompt population. Both channels optimize the same reward, concurrently. No parameter freeze, no sequential hand-off.

This lets task-specific lessons move quickly through the fast channel, while preserving more of the base model’s general behavior in the slow channel.

0

2

1

0

101

Eragon @EragonAI

4 days ago

4/ This reframes post-training. The default view treats adaptation as one channel — push every improvement into the weights — and pays for it with forgetting, eroded generality, and lost plasticity. FST splits that into two channels that co-evolve: task-specific nuance lives in fast weights (prompts), durable capability in slow weights (parameters). And it's a blueprint, not a single algorithm. At Eragon, we are interested in AI systems that keep getting better at new things without getting worse at everything else. If you are an engineer working in Applied AI and/or Machine Learning, join Eragon to build the future underlying layer of the next form of human organization. https://t.co/FgLpKu8cCZ

0

2

0

94

Eragon @EragonAI

4 days ago

3/ FST beats RL-only across four axes: - Data efficiency: FST reaches RL's running peak in substantially fewer optimizer steps — 3.0× fewer on CodeIO, 1.4× on Math (Polaris), and 3.0× on HoVer-hard — and continuing past the crossover, FST's running peak also exceeds RL's on all three tasks. - Higher performance asymptote: FST scores higher than RL across all three performance asymptote: +4.4pp on CodeIO, +2.9pp on math, +7.7pp on HoVer-hard - Preserved plasticity: at matched reward, FST models have up to 70% lower KL to the base policy than RL-only baselines. Starting from a Math or Physics checkpoint trained with either method, a fresh RL pass on HoVer-hard over 400 steps, while FST-init preserves more capacity for the new task than RL-init on both arms, and on the Math arm prior RL collapses HoVer-hard learnability to near-zero. - Continual learning: in a 3-task stream, FST gained ~20pp in a stage where RL gained ~2.5pp (~8× the acquisition rate)

EragonAI's tweet photo. 3/ FST beats RL-only across four axes:

- Data efficiency: FST reaches RL's running peak in substantially fewer optimizer steps — 3.0× fewer on CodeIO, 1.4× on Math (Polaris), and 3.0× on HoVer-hard — and continuing past the crossover, FST's running peak also exceeds RL's on all three tasks.

- Higher performance asymptote: FST scores higher than RL across all three performance asymptote: +4.4pp on CodeIO, +2.9pp on math, +7.7pp on HoVer-hard

- Preserved plasticity: at matched reward, FST models have up to 70% lower KL to the base policy than RL-only baselines. Starting from a Math or Physics checkpoint trained with either method, a fresh RL pass on HoVer-hard over 400 steps, while FST-init preserves more capacity for the new task than RL-init on both arms, and on the Math arm prior RL collapses HoVer-hard learnability to near-zero.

- Continual learning: in a 3-task stream, FST gained ~20pp in a stage where RL gained ~2.5pp (~8× the acquisition rate)

0

1

0

69

EragonAI retweeted

Josh Sirota

@joshua_sirota

6 days ago

How FST Works: To leverage the strong in-context learning of current LLMs, we treat the context as "fast weights" and model parameters as "slow weights", drawing from a rich literature in classic ML

joshua_sirota's tweet photo. How FST Works: To leverage the strong in-context learning of current LLMs, we treat the context as "fast weights" and model parameters as "slow weights", drawing from a rich literature in classic ML https://t.co/QdSf3Nob7v

0

6

2

3

662

EragonAI retweeted

Josh Sirota

@joshua_sirota

6 days ago

Announcing Fast-Slow Training (FST) pairing "slow" weights with "fast" context. We try to answer the question, can LLMs adapt continually without losing base skills? FST vs RL: - 3x more sample-efficient -Higher performance ceiling - Less KL drift - Continual learning: succeeds where RL stalls

joshua_sirota's tweet photo. Announcing Fast-Slow Training (FST) pairing "slow" weights with "fast" context.

We try to answer the question, can LLMs adapt continually without losing base skills?

FST vs RL:
- 3x more sample-efficient
-Higher performance ceiling
- Less KL drift
- Continual learning: succeeds where RL stalls

2

11

3

8

1K

EragonAI retweeted

Dave Anderson @MrDaveAllen

6 days ago

Excited to share our first research paper Learning, Fast and Slow: Towards LLMs That Adapt Continually. Fast-Slow Training (FST) combines optimized context with model weight updates. Read more here: https://t.co/E7CoNGp7Rz

0

2

662

Eragon @EragonAI

6 days ago

Shoutout to researchers from @UCBerkeley @Mila_Quebec @UTAustin @periodiclabs and Mirendil on this collaboration!

0

1

0

454

Eragon @EragonAI

6 days ago

1/ At Eragon, we’re building an AI operating system that connects a company’s entire tech stack into a single interface for work, powered by a model post-trained on the customer’s own data so it understands the company’s unique context. We believe that AI system post-training shouldn’t have to choose between adapting quickly and learning durably: the future of adaptive AI is fast learning + slow learning: - fast enough to absorb task-specific lessons - slow enough to improve without forgetting Our recent research paper: Learning, Fast and Slow, makes that case. https://t.co/RG4wKWUk6i

13

60

9

20

5K

Eragon @EragonAI

6 days ago

4/ This reframes post-training. The default view treats adaptation as one channel — push every improvement into the weights — and pays for it with forgetting, eroded generality, and lost plasticity. FST splits that into two channels that co-evolve: task-specific nuance lives in fast weights (prompts), durable capability in slow weights (parameters). And it's a blueprint, not a single algorithm. At Eragon, we are interested in AI systems that keep getting better at new things without getting worse at everything else. If you are an engineer working in Applied AI and Machine Learning, join Eragon to build the future underlying layer of the next form of human organization. https://t.co/FgLpKu8Ksx

0

3

1

0

382

Eragon @EragonAI

6 days ago

3/ FST beats RL-only across four axes: - Data efficiency: FST reaches RL's running peak in substantially fewer optimizer steps — 3.0× fewer on CodeIO, 1.4× on Math (Polaris), and 3.0× on HoVer-hard — and continuing past the crossover, FST's running peak also exceeds RL's on all three tasks. - Higher performance asymptote: FST scores higher than RL across all three performance asymptote: +4.4pp on CodeIO, +2.9pp on math, +7.7pp on HoVer-hard - Preserved plasticity: at matched reward, FST models have up to 70% lower KL to the base policy than RL-only baselines. Starting from a Math or Physics checkpoint trained with either method, a fresh RL pass on HoVer-hard over 400 steps, while FST-init preserves more capacity for the new task than RL-init on both arms, and on the Math arm prior RL collapses HoVer-hard learnability to near-zero. - Continual learning: in a 3-task stream, FST gained ~20pp in a stage where RL gained ~2.5pp (~8× the acquisition rate)

1

4

1

0

474

Eragon @EragonAI

12 days ago

https://t.co/aJfp0MWfBe

0

9

1

298

Eragon @EragonAI

3 months ago

@davj @joshua_sirota 🚀

0

1

0

45

Eragon @EragonAI

8 months ago

@rhobusiness @joshua_sirota @TarlonKhoubyari 🤝

0

1

0

61

Eragon @EragonAI

8 months ago

The biggest. Don't miss it. Live and Direct.

Cathy Di

@itsCathyDi

8 months ago

THIS FRIDAY we are hosting the biggest hackathon at sf @Techweek_. win $25K+ in prizes and network with sf’s top talent. link to rsvp is in the comments! supported by @OpenAI, @elevenlabs, @windsurf, @convex, @FireworksAI_HQ, @EragonAI, @vibekanban, and more.