Thomas Rupf @th_rupf - Twitter Profile

about 2 months ago

If you're at ICLR🇧🇷 come check out OpTI-BFM at poster P4-4504 tomorrow morning or as an oral at 4:03pm in the Amphitheater.

Thomas Rupf @th_rupf

4 months ago

Excited to share that our paper "Optimistic Task Inference for Behavior Foundation Models" was accepted for ICLR 2026. BFMs are great at zero-shot RL, but task inference requires a dataset with reward labels. Our method OpTI-BFM offers an online alternative. (1/5)

th_rupf's tweet photo. Excited to share that our paper "Optimistic Task Inference for Behavior Foundation Models" was accepted for ICLR 2026.

BFMs are great at zero-shot RL, but task inference requires a dataset with reward labels. Our method OpTI-BFM offers an online alternative.

(1/5) https://t.co/yaBLV7bw1L

1

12

5

4

2K

1

15

4

2

942

th_rupf retweeted

Marco Bagatella @mar_baga

about 2 months ago

Which representations are meaningful for control? We're presenting TD-JEPA as an oral at ICLR🇧🇷: a zero-shot reinforcement learning algorithm using self-prediction (JEPA) to learn representations that are predictive of long-term, policy-dependent behavior. It works pretty well!🧵

1

207

34

139

15K

Thomas Rupf @th_rupf

4 months ago

Huge thanks to my co-authors @mar_baga, @vlastelicap, @arkrause! Paper: https://t.co/daIR9WFRPO (5/5)

0

2

1

194

Thomas Rupf @th_rupf

4 months ago

Excited to share that our paper "Optimistic Task Inference for Behavior Foundation Models" was accepted for ICLR 2026. BFMs are great at zero-shot RL, but task inference requires a dataset with reward labels. Our method OpTI-BFM offers an online alternative. (1/5)

1

12

5

4

2K

Thomas Rupf @th_rupf

4 months ago

OpTI-BFM bares similarities with LinUCB for Bandits which we use to prove sublinear regret in episodic settings under mild assumptions. Because it's online, OpTI-BFM can also adapt to time-varying (non-stationary) rewards by decaying the weight on older observations. (4/5)

th_rupf's tweet photo. OpTI-BFM bares similarities with LinUCB for Bandits which we use to prove sublinear regret in episodic settings under mild assumptions.

Because it's online, OpTI-BFM can also adapt to time-varying (non-stationary) rewards by decaying the weight on older observations.

(4/5) https://t.co/GVy4FrxVj0

1

0

88

th_rupf retweeted

Núria Armengol @NriaArmengol2

10 months ago

Last week I presented our last work: 🐝“Epistemically-guided forward backward exploration (FBEE)”🐝 at the @RL_Conference TLDR: Active learning for unsupervised RL

2

49

9

11

3K

th_rupf retweeted

Marco Bagatella @mar_baga

11 months ago

When multiple tasks need improvements, fine-tuning a generalist policy becomes tricky. How do we allocate a demonstration budget across a set of tasks of varied difficulty and familiarity? We are presenting a possible solution at ICML on Wednesday! (1/3)

mar_baga's tweet photo. When multiple tasks need improvements, fine-tuning a generalist policy becomes tricky. How do we allocate a demonstration budget across a set of tasks of varied difficulty and familiarity?

We are presenting a possible solution at ICML on Wednesday!

(1/3) https://t.co/Fpln2oKm3Q

1

17

8

3

1K

Thomas Rupf @th_rupf

11 months ago

If this sounds interesting, come by on Tuesday, 4:30 pm �� 7:00 pm, West Exhibition Hall B2–B3 #W-618. Collaborators: @mar_baga, @nicoguertler, @JonasFrey96, @GMartius https://t.co/Cv4ZP6UYew (3/3)

0

1

0

49

Thomas Rupf @th_rupf

11 months ago

Zero-shot imitation from just a single sparse demonstration is hard. Goal-conditioned methods tend to “greedily" move from one state to the next and lose the big picture. We're presenting an alternative approach on Tuesday at #ICML2025. (1/3)

1

16

7

11

1K

Thomas Rupf @th_rupf

11 months ago

Our method tackles the occupancy matching objective directly at test-time by estimating the agent's occupancy with samples from a learned world model and matching it to the expert occupancy using Optimal Transport. (2/3)

th_rupf's tweet photo. Our method tackles the occupancy matching objective directly at test-time by estimating the agent's occupancy with samples from a learned world model and matching it to the expert occupancy using Optimal Transport.

(2/3) https://t.co/J0C9iFlrIR

1

0

66

Thomas Rupf

@th_rupf

Last Seen Users on Sotwe

Trends for you

Most Popular Users