Barna Pásztor @pasztorb - Twitter Profile

Pinned Tweet

2 months ago

🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training 1. Annotating human preference datasets without spending a fortune 2. Quantifying uncertainty for reward models 🔗https://t.co/sEYx618oIc

pasztorb's tweet photo. 🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training
1. Annotating human preference datasets without spending a fortune
2. Quantifying uncertainty for reward models
🔗https://t.co/sEYx618oIc https://t.co/w74uOYBd5p

1

73

14

56

6K

Barna Pásztor @pasztorb

26 days ago

@agarwl_ Great work! I often think of the weights the other way around. Model-weights govern the immediate prompt-response connection (System 1) while prompt-weights (or the harness) define the slow-thinking process through reasoning, tool-calls, self-reflection,... (System 2).

0

65

Barna Pásztor @pasztorb

about 2 months ago

If you're at ICLR 2026, come by 👇 🗓️ Saturday, April 25, 10.30 to 13.00 📍 Poster Session 5, Pavilion 4, #4808 📄 https://t.co/CpOxTtdICV 💻 https://t.co/rvg6KeR6w5 Joint work w/ @thomasklbg and @arkrause.

0

5

1

0

242

Barna Pásztor @pasztorb

about 2 months ago

What do you do when reward models fail in RLHF? Scalar rewards flatten messy, context dependent human preferences into a single number. The reward model learns a distortion, and the policy optimizes it faithfully. 🧵

1

21

2

7

2K

Who to follow

Medicina MDPI

@MdpiMedicina

Medicina (ISSN: 1648-9144) is an international #openaccess journal, covering all aspects of medical research published monthly online by @MDPIOpenAccess

Бесстыдны наши сочинения, но жизнь – достойна https://t.co/atOmtMtw1p

Barna Pásztor @pasztorb

about 2 months ago

A Leader commits to an action, and a Follower refines it. This asymmetry captures richer preferences than scalar rewards and provides stable training. As a bonus, it offers inference-time refinement with two turn rollouts deliver ~60% gains over single turn.

1

4

1

0

158

Barna Pásztor @pasztorb

2 months ago

Huge thanks to all contributing to these papers! @lenalibon @jessicalamjh Daniel Yang @Davit_Melikidze Florian Redhardt @Marian_Schn @Martin_Wertich Samuel Stante @pkassraie_ @idohakimi @arkrause

0

3

0

366

Barna Pásztor @pasztorb

2 months ago

🚀 Two new papers from our team are now available on ArXiv, both tackling core bottlenecks in RL post-training 1. Annotating human preference datasets without spending a fortune 2. Quantifying uncertainty for reward models 🔗https://t.co/sEYx618oIc

1

73

14

56

6K

Barna Pásztor @pasztorb

2 months ago

📄 RewardUQ (https://t.co/nZl8WnkTtN) We rigorously compare UQ methods for reward models and draw practical insights for active learning and robust RL post-training. The results were immediately applied in ActiveUltraFeedback!

1

3

0

2

345

Barna Pásztor @pasztorb

3 months ago

📄 ActiveUltraFeedback (https://t.co/QETqRMA2dn) How much preference data do you really need? We show that active learning can match or beat static baselines using as little as 1/6 of the annotations across datasets and algorithms!

0

2

0

85

pasztorb retweeted

Thomas Kleine Buening

@thomasklbg

4 months ago

Deployed LLMs and users generate millions of conversations every day. These are full of useful learning signals, yet we don't use them for training. We introduce self-distillation for learning directly from user conversations – no rewards, no labels, no extra models.

thomasklbg's tweet photo. Deployed LLMs and users generate millions of conversations every day.

These are full of useful learning signals, yet we don't use them for training.

We introduce self-distillation for learning directly from user conversations – no rewards, no labels, no extra models. https://t.co/he3Od43TFm

9

254

36

226

55K

pasztorb retweeted

ZurichAI @zurichnlp

5 months ago

ZurichNLP#19 is next Monday at @ETH_AI_Center! Sina Ahmadi (@sina_ahm, @UZH_en) on language for low-resource varities, and Barna Pasztor (@pasztorb, @ETH_AI_Center) on sample-efficient dataset collection for RLHF. RSVP below! Spots limited as always.

1

7

3

0

682

Barna Pásztor @pasztorb

6 months ago

I am attending @NeurIPSConf 2025 next week in San Diego, CA! Reach out to chat about RLHF and preference optimisation! I am happy to discuss future collaborations and open positions in 2026. #NeurIPS2025

0

9

0

353

pasztorb retweeted

ETH AI Center @ETH_AI_Center

9 months ago · Zurich

Great to have @eldsjal visit with @shak & @piammichel, yesterday! Many nice demo day interactions with our cutting-edge AI research projects & ventures. Their concluding message: now’s the time to build with massive impact - and ETH AI Center is one of the best places to start 🚀

ETH_AI_Center's tweet photo. Great to have @eldsjal visit with @shak & @piammichel, yesterday! Many nice demo day interactions with our cutting-edge AI research projects & ventures. Their concluding message: now’s the time to build with massive impact - and ETH AI Center is one of the best places to start 🚀 https://t.co/UYv1tRTX7u

0

15

4

0

2K

Barna Pásztor @pasztorb

9 months ago

Amazing experience to be part of this project and work on post-training at scale with an exceptional team! More great things to come to push the open-source LLM community!

CSCS Lugano @cscsch

9 months ago

@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: https://t.co/7bJlINiIdn #Apertus #AI

cscsch's tweet photo. @EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: https://t.co/7bJlINiIdn #Apertus #AI https://t.co/gZ96Lhxo6F

17

163

44

38

66K

0

22

2

1

2K

pasztorb retweeted

Paul Friedrich @pa_friedrich

about 1 year ago

At #AAMAS25 in Detroit this week and presenting my work with @pasztorb & @gio_ramponi Thursday afternoon - if you're here, let's connect and chat about learned algorithmic collusion, or go for a morning run!

pa_friedrich's tweet photo. At #AAMAS25 in Detroit this week and presenting my work with @pasztorb & @gio_ramponi Thursday afternoon - if you're here, let's connect and chat about learned algorithmic collusion, or go for a morning run! https://t.co/ytsIHOlWuU

0

5

1

0

2K

Barna Pásztor @pasztorb

over 1 year ago

I am presenting two papers this week at #NeurIPS2024 focusing on preference-based RL! 1. Contextual Bilevel Reinforcement Learning for Incentive Alignment: #6505 West, 11AM, Thursday 2. Bandits with Preference Feedback: A Stackelberg Game Perspective: #5807 West, 11AM, Friday

pasztorb's tweet photo. I am presenting two papers this week at #NeurIPS2024 focusing on preference-based RL!
1. Contextual Bilevel Reinforcement Learning for Incentive Alignment: #6505 West, 11AM, Thursday
2. Bandits with Preference Feedback: A Stackelberg Game Perspective: #5807 West, 11AM, Friday https://t.co/sliE0PQ5RU

0

22

5

3

2K

pasztorb retweeted

Giorgia Ramponi @gio_ramponi

over 1 year ago

I am not attending #NeurIPS this year, but Vinzenz Thoma and @pasztorb yes :) Come to chat about our recent work on "Contextual Bilevel Reinforcement Learning for Incentive Alignment" 🗓️ Thu 12 Dec 11 a.m

0

18

1

0

2K

pasztorb retweeted

ETH AI Center @ETH_AI_Center

over 1 year ago

🔬 Advance the frontiers of AI: @ETH_AI_Center Fellowship Programs –#PhD & #Postdoc Opportunities 🔬 💫Push the boundaries of Reinforcement Learning and Data-driven Control💫 ✍️ Apply by November 19, 2024: ttps://ai.ethz.ch/apply

ETH_AI_Center's tweet photo. 🔬 Advance the frontiers of AI: @ETH_AI_Center Fellowship Programs –#PhD & #Postdoc Opportunities 🔬
💫Push the boundaries of Reinforcement Learning and Data-driven Control💫
✍️ Apply by November 19, 2024: ttps://ai.ethz.ch/apply https://t.co/4y412RhIMf

1

8

5

6

2K

pasztorb retweeted

Gergely Neu @neu_rips

over 1 year ago

PLS SHARE: I'm hiring a PhD student to work on ML theory, to begin in Fall 2025. Topics include: generalization bounds & statistical inference via online prediction, representation learning via optimal transport, sequential decision making... More info: https://t.co/QFLEqWZORj

neu_rips's tweet photo. PLS SHARE:
I'm hiring a PhD student to work on ML theory, to begin in Fall 2025.
Topics include: generalization bounds & statistical inference via online prediction, representation learning via optimal transport, sequential decision making...
More info:
https://t.co/QFLEqWZORj https://t.co/XKVICY0I83

5

292

82

122

58K

Barna Pásztor

@pasztorb

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users