Raphael Avalos @raphael_avalos - Twitter Profile

Pinned Tweet

about 1 year ago

Last week, I wrapped up my internship @cohere, where I had the chance to work with fantastic people on RL for LLMs. It was an amazing 6 months, and I'm excited to share one of the outcomes: ShiQ, a Q-value based RL algorithm for fine-tuning LLMs 🚀 🧵Details in @irombie's post!

Irem Ergün

@irombie

about 1 year ago

I'm excited to share our new pre-print ShiQ: Bringing back Bellman to LLMs! https://t.co/yWMT6M0nuT In this work, we propose a new, Q-learning inspired RL algorithm for finetuning LLMs 🎉 (1/n)

11

223

37

135

26K

0

30

2

1K

Raphael Avalos @raphael_avalos

about 1 year ago

🚀 Excited to share the 3rd outcome of my internship at @CohereAI: a new RL algo for agentic LLMs that combines policy learning and world modeling, letting agents verify actions before executing them. Check out the 🧵 and 📄! Big thanks to my co-authors and Cohere’s RL team 🙏

Shangmin Guo

@ShangminGuo

about 1 year ago

📢After months of work, I can finally share our latest research, couldn’t be more thrilled and excited. 🎉 We unify a policy 🤖 and a world model 🌍 into a single LLM, thus no external dynamics model needed! Why does this matter? Because now, the policy can plan based on its internal world model! And this planning boosts tool-use success rates to >90%, on top of SFT + RL. 📄: https://t.co/5z72BwWnGT 🧵[1/8]

ShangminGuo's tweet photo. 📢After months of work, I can finally share our latest research, couldn’t be more thrilled and excited. 🎉

We unify a policy 🤖 and a world model 🌍 into a single LLM, thus no external dynamics model needed!

Why does this matter? Because now, the policy can plan based on its internal world model!

And this planning boosts tool-use success rates to >90%, on top of SFT + RL.

📄: https://t.co/5z72BwWnGT

🧵[1/8]

2

151

25

111

15K

0

19

1

1K

raphael_avalos retweeted

Andrew Zhao

@_AndrewZhao

about 1 year ago

Okay, I was definitely not vague posting

7

417

28

331

71K

Raphael Avalos @raphael_avalos

about 1 year ago

Excited to share the technical report on Command R7B (7B) and Command A (111B), our flagship model! These models are the result of incredible teamwork at @cohere, and it was an honor to be part of it. Report: https://t.co/0pOyajfQbe

Seraphina Goldfarb-Tarrant @ICLR🇧🇷 @seraphinagt

about 1 year ago

Today (two weeks after model launch 🔥) we're releasing a technical report of how we made Command A and R7B 🚀! It has detailed breakdowns of our training process, and evaluations per capability (tools, multilingual, code, reasoning, safety, enterprise, long context)🧵 1/3.

2

61

20

12

9K

0

2

0

1

222

Who to follow

Willem Röpke

@willem_ropke

Research @cohere | Interested in learning

Roxana Rădulescu

@rox_teo

Assistant professor @UniUtrecht @UUBeta (@VUBrussel @aibrussels) Machine Learning, Reinforcement Learning, Multi-Agent Systems, Multi-Objective Optimisation

Matteo Pirotta

@teopir

raphael_avalos retweeted

Adaptive and Learning Agents (ALA) Workshop @ALA_workshop

over 1 year ago

📢 Deadline Extended! 📢 Due to multiple requests and the overlap with @RL_Conference and @RealAAAI, we’re extending the Adaptive Learning Agent workshop @AAMASconf submission deadline to March 1st (AOE)! 🚀 🔗 More details: https://t.co/Qz1XDw0TdU

0

3

0

242

raphael_avalos retweeted

Adaptive and Learning Agents (ALA) Workshop @ALA_workshop

over 1 year ago

🚨 Less than 48 hours left to submit to the 17th Adaptive Learning Agent workshop at @AAMASconf! 🚨 We welcome full papers, work in progress, and 2-page abstracts of recent journal papers. Don't miss the deadline! 🔗 More details: https://t.co/Qz1XDw0TdU

1

6

5

0

570

raphael_avalos retweeted

Willem Röpke @willem_ropke

over 1 year ago

Exciting news! My paper on multi-objective reinforcement learning was accepted at AAMAS 2025! We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems. 🔗 Paper: https://t.co/U8Sx6B0q5A 💻 Code: https://t.co/Umf6oQXJBH

3

30

6

8

2K

raphael_avalos retweeted

Adaptive and Learning Agents (ALA) Workshop @ALA_workshop

over 1 year ago

Missed the deadline? No worries! We've extended the submission deadline to Feb 25! Find all the details on our website: https://t.co/v672Oi1Deb

0

2

1

0

90

Raphael Avalos @raphael_avalos

over 1 year ago

Don't miss the opportunity to submit your (Multi-Agent) RL work to the ALA workshop!

Adaptive and Learning Agents (ALA) Workshop @ALA_workshop

over 1 year ago

Still 8 days to submit your work to the ALA workshop at AAMAS! We welcome full papers, work in progress, and 2-page abstracts of recently published journal papers. All the info is available at https://t.co/wVu1Wp4uTX.

0

1

0

515

0

4

2

1

463

Raphael Avalos @raphael_avalos

over 1 year ago

The X account and website for the next edition of the ALA workshop is live! Follow it to get all the updates :)

Adaptive and Learning Agents (ALA) Workshop @ALA_workshop

over 1 year ago

Excited to announce the 17th Adaptive Learning Agent workshop at @AAMASconf in May! We welcome full papers, work in progress, and 2-page abstracts of recently published journal papers. Find out more at our website: https://t.co/v672Oi2b3J. Deadline for submissions: February 4th.

1

2

0

271

0

3

0

92

Raphael Avalos @raphael_avalos

over 1 year ago

Starting my internship at @cohere today to work on LLMs! I'll be in Paris a couple of days a week, so if anyone wants to meet up, let me know!

0

28

0

5

1K

raphael_avalos retweeted

Florent Delgrange @f_delgrange

almost 2 years ago

Two weeks ago, I publicly defended my PhD thesis, entitled « Activating Formal Verification of Deep Reinforcement Learning Policies by Model Checking Bisimilar Latent Space Models ». 📚 The full dissertation is available here: https://t.co/Yvgjzvt31t (1/n)

f_delgrange's tweet photo. Two weeks ago, I publicly defended my PhD thesis, entitled « Activating Formal Verification of Deep Reinforcement Learning Policies by Model Checking Bisimilar Latent Space Models ».
📚 The full dissertation is available here: https://t.co/Yvgjzvt31t
(1/n) https://t.co/dpw2sidPyq

1

4

1

420

Raphael Avalos @raphael_avalos

almost 2 years ago

Looking forward to the next edition, and in the meantime, see you all at EWRL in Toulouse this October! 🚀 3/3

0

4

0

97

Raphael Avalos @raphael_avalos

almost 2 years ago

The 1st edition of @RL_Conference was amazing! Congrats to the organizers for making this happen and for trying a new review system. I had such a great time with @GsprdLambrechts @kohler_hector @SuauMiguel @RiccZamboni Mathieu Reymond and all the others! 1/3

1

18

1

0

883

Raphael Avalos @raphael_avalos

almost 2 years ago

I also had the pleasure of presenting our latest work on Online Planning for POMDPs with State Requests (with E. Bargiacchi, A. Nowé, @DiederikRo, @faoliehoek). Check the paper here: https://t.co/FIN6Gn6U9P 2/3

raphael_avalos's tweet photo. I also had the pleasure of presenting our latest work on Online Planning for POMDPs with State Requests (with E. Bargiacchi, A. Nowé, @DiederikRo, @faoliehoek). Check the paper here: https://t.co/FIN6Gn6U9P 2/3 https://t.co/bPG2aGSrRH

1

6

1

0

353

raphael_avalos retweeted

Hector Kohler @kohler_hector

almost 2 years ago

@RL_Conference was a blast and I caught up with some of the usual suspects from european RL @vernadec @araffin2 @raphael_avalos @GsprdLambrechts @RiccZamboni. See you all at EWRL 2024. Looking forward to next year's edition!! 🥳🧠

0

8

3

0

446

raphael_avalos retweeted

Willem Röpke @willem_ropke

almost 2 years ago

Okay people, I need some help. We’re working on a project and have been stuck for a while. My final guess for what the issue may be is that gradients are not flowing as we would want them. Does anyone have a intuitive visualisation/debugging tool for gradient flows in jax?

0

3

1

510

raphael_avalos retweeted

Alizée Pace @AlizeePace

about 2 years ago

Presenting work on synthetic preference generation at two #ICLR2024 workshops today: DPFM & GenAI4DM @genai4dm. Come say hi to find out how to improve your reward model without collecting additional human feedback!

AlizeePace's tweet photo. Presenting work on synthetic preference generation at two #ICLR2024 workshops today: DPFM & GenAI4DM @genai4dm.

Come say hi to find out how to improve your reward model without collecting additional human feedback! https://t.co/SD0uWBbonT

0

20

2

1K

Raphael Avalos @raphael_avalos

about 2 years ago

If you are attending #ICLR2024 workshops go checkout this cool work !

Hugo Yeche (@hy9.bsky.social) @HugoYeche

about 2 years ago

In clinical early warning systems (EWS), can we go beyond the model estimate of event occurrence and leverage its belief about the event distance to improve our alarm policy? Introducing “Dynamic Survival Analysis for Early Event Prediction” with @ToManuelBurger and @gxr. 🧶

1

5

3

1

2K

0

134

Raphael Avalos @raphael_avalos

about 2 years ago

Poster session now ! We are waiting for you with @f_delgrange at the poster 158 ! #ICLR2024

Raphael Avalos @raphael_avalos

about 2 years ago

Arrived at #ICLR2024 with @f_delgrange to present our work "The Wasserstein Believer: Learning Belief Updates for Partially Observable MDPs through Reliable Latent Space Models".

raphael_avalos's tweet photo. Arrived at #ICLR2024 with @f_delgrange to present our work "The Wasserstein Believer: Learning Belief Updates for Partially Observable MDPs through Reliable Latent Space Models". https://t.co/KispQwJGxG

3

11

2

4

2K

0

5

1

0

493

Raphael Avalos

@raphael_avalos

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users