Anton Baumann @_antonbaumann - Twitter Profile

2 days ago

On-Policy Distillation is the most active new research direction being explored in RL for LLMs. Had the chance to discuss how it works with Dwarkesh and why it fits so nicely into large-scale pipelines.

21

1K

124

1K

128K

_antonbaumann retweeted

Ronak Malde

@rronak_

10 days ago

We have been exploring new algorithmic frontiers and are excited to share our contributions to Self Distillation Policy Optimization (SDPO) for agentic continual learning, check out our blog post here: https://t.co/5xjL02jtUz

3

70

6

22

38K

_antonbaumann retweeted

Sasha Rush

@srush_nlp

18 days ago

Been working on text feedback / OPSD in Composer. Really interesting space, and much more to be explored.

11

277

28

132

40K

_antonbaumann retweeted

Jonas Hübotter

@jonashubotter

18 days ago

Self-distillation for long-horizon training at scale!

1

68

5

9

5K

_antonbaumann retweeted

Jonas Hübotter

@jonashubotter

about 1 month ago

Today and tomorrow we’ll be presenting self-distillation with orals at ICLR in Rio 🇧🇷 1. “Self-Distillation enables Continual Learning” at lifelong agents workshop (Sun 11:30am) 2. “Reinforcement Learning via Self-Distillation” at scaling post-training workshop (Mon 2:40pm) 3. “Test-Time Self-Distillation” at test-time updates workshop (Mon 4:15pm)

jonashubotter's tweet photo. Today and tomorrow we’ll be presenting self-distillation with orals at ICLR in Rio 🇧🇷

1. “Self-Distillation enables Continual Learning” at lifelong agents workshop (Sun 11:30am)
2. “Reinforcement Learning via Self-Distillation” at scaling post-training workshop (Mon 2:40pm)
3. “Test-Time Self-Distillation” at test-time updates workshop (Mon 4:15pm)

10

430

48

276

102K

_antonbaumann retweeted

Jonas Hübotter

@jonashubotter

4 months ago

Just came across this great discussion of self-distillation on @latentspacepod! Really good run down by Ted Kyi and we’re every bit excited about what’s next as he is! https://t.co/G5LrWlOT8B

0

21

4

7

3K

_antonbaumann retweeted

Explainable Machine Learning @ExplainableML

4 months ago

3/ Post-hoc Probabilistic Vision-Language Models @_antonbaumann, @ruili_pml, Marcus Klasson, Santeri Mentu, @ShyamgopalKart1, @zeynepakata, @arnosolin, Martin Trapp [Paper]: https://t.co/kYxKZSe1x9 [Project]: https://t.co/p7TzhvZQrn [Code]: https://t.co/UPnWQZmlvw

1

0

165

_antonbaumann retweeted

Jonas Hübotter

@jonashubotter

4 months ago

Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed. Today, we introduce a simple algorithm that enables the model to learn from any rich feedback! And then turns it into dense supervision. (1/n)

jonashubotter's tweet photo. Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed.

Today, we introduce a simple algorithm that enables the model to learn from any rich feedback!
And then turns it into dense supervision.

(1/n) https://t.co/AR0yWgaKnL

22

1K

139

1K

211K

Anton Baumann @_antonbaumann

over 9 years ago

@OverwatchEU Habt ihr schon an Junkensteins Tür geklopft? Es gibt eine Overwatch-PS4 zu gewinnen! https://t.co/8bgTYj5kuM #OWHalloween3

0

Anton Baumann

@_antonbaumann

Last Seen Users on Sotwe

Trends for you

Most Popular Users