Geoffrey Cideron

@CdrGeo

Research Engineer at Google DeepMind. Spent time at FAIR London, INRIA Lille, and Instadeep.

Joined March 2019

417 Following

228 Followers

20 Posts

Geoffrey Cideron @CdrGeo

over 1 year ago

S/O to my amazing collaborators!! @ramealexandre @OlivierBachem @sarah_perrin_ @_andrea_agos @johanferret Romuald Elie and Sertan Girgin

213

Geoffrey Cideron @CdrGeo

over 1 year ago

Happy to introduce our new paper "Diversity-Rewarded CFG Distillation". We combine distillation, a novel diversity reward, and model merging to improve the quality-diversity tradeoff of MusicLM. arxiv: https://t.co/jwkyquIW4z More info:

Alexandre Ramé @ramealexandre

over 1 year ago

An AI will win a Nobel price someday✨. Yet currently, alignment reduces creativity. Our new @GoogleDeepMind paper "diversity-rewarded CFG distillation" improves quality AND diversity for music, via distillation of test-time compute, RL with a diversity reward, and model merging. arxiv: https://t.co/7wiXHNr2uW website: https://t.co/YVUPsmVhPR

149

32K

CdrGeo retweeted

Robert Dadashi @robdadashi

about 2 years ago

I am very happy to announce that Gemma 1.1 Instruct 2B and “7B” are out! Here are a few details about the new models: 1/11

362

137

325K

CdrGeo retweeted

Robert Dadashi @robdadashi

over 2 years ago

I am so proud to see Gemma released today! I have had a fantastic time working on post-training and RLHF with an amazing team. Cannot wait to see what the community builds with these models!

Who to follow

Neil Zeghidour

@neilzegh

CEO @GradiumAI. Founder of @kyutai_labs. Invented neural codecs and audio LLMs. Prev. Google DeepMind/Brain, Meta, Toha Heavy Industries.

Antoine Moulin

@antoine_mln

doing a phd in RL/online learning on questions related to exploration and adaptivity

Michal Valko

@misovalko

Founding Researcher @ Isara Labs & Inria & MVA. Ex: Llama @AIatMeta; Gemini & BYOL @GoogleDeepMind. LLMs, RL, alignment.

CdrGeo retweeted

Johan Ferret @johanferret

over 2 years ago

Online feedback is crucial for alignment, so we propose a simple recipe to make any direct alignment method (think DPO / IPO / SLiC-HF) online using AI feedback 🧙‍♂️ In human evals, online methods yield on avg 66% wins, 28% ties and 6% losses vs offline methods (on TL;DR) 👀

CdrGeo retweeted

@_akhaliq

over 2 years ago

Google presents MusicRL Aligning Music Generation to Human Preferences paper page: https://t.co/FL4jDRdXpi propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models.

_akhaliq's tweet photo. Google presents MusicRL

Aligning Music Generation to Human Preferences

paper page: https://t.co/FL4jDRdXpi

propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models.

275

111

41K

CdrGeo retweeted

Neil Zeghidour

@neilzegh

over 2 years ago

Very proud of the work done by @CdrGeo , one of my last projects at Google. When we released MusicLM in May ’23, we incorporated a feedback system to realize the first ever large-scale, organic improvement of music generation through RLHF. 🎶🧵

Geoffrey Cideron @CdrGeo

over 2 years ago

Shoutout to my amazing collaborators: @_andrea_agos, @neilzegh, @leonardhussenot, @OlivierBachem, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, @zalanborsos, Brian McWilliams, Victor Ungureanu, Olivier Pietquin , Matthieu Geist.

528

Geoffrey Cideron @CdrGeo

over 2 years ago

Happy to introduce our paper MusicRL, the first music generation system finetuned with human preferences. Paper link: https://t.co/81wIOSDTTa

22K

Geoffrey Cideron @CdrGeo

over 2 years ago

Samples can be found at https://t.co/42VF1LxTlv.

532

CdrGeo retweeted

Johan Ferret @johanferret

about 3 years ago

Our #ACL2023 paper "Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback" is now on arXiv! tl;dr - we improve the factuality of summaries via RL, without human feedback! 📜 https://t.co/KCpasvtlRM Thread (1/10) 👇

31K

CdrGeo retweeted

ëugene kharitonov 🏴‍☠️ @n0mad_0

about 3 years ago

We* are looking for a Student Researcher** to work with us on a project in intersection of modeling/generating speech/audio, NLP, and representation learning. *AudioLM team @ Google Research (@zalanborsos, @neilzegh, myself and many others!) **not-last-year PhD student

18K

CdrGeo retweeted

Robert Dadashi @robdadashi

over 3 years ago

Very proud to contribute to making RL agents more accessible and reproducible!

CdrGeo retweeted

Olivier Bachem @OlivierBachem

almost 4 years ago

A common belief is that text auto encoders produce badly structured latent spaces with holes. We were surprised to find that using round-trip translations (e.g. en->de->en) one can obtain nicely structured latent spaces. Check out https://t.co/5fCahGZZ9i.

OlivierBachem's tweet photo. A common belief is that text auto encoders produce badly structured latent spaces with holes. We were surprised to find that using round-trip translations (e.g. en->de->en) one can obtain nicely structured latent spaces. Check out https://t.co/5fCahGZZ9i. https://t.co/VfFXtkmYuJ

CdrGeo retweeted

Johan Ferret @johanferret

over 4 years ago

Excited to announce that our #AAMAS2022 paper "Lazy-MDPs: Towards Interpretable RL by Learning When to Act" is on arXiv! 🦥 tl;dr - we introduce lazy-MDPs, modified MDPs that allow agents to defer decision-making to a third-party policy 📜 https://t.co/Tsd2d8RPCc 🧵👇

Geoffrey Cideron @CdrGeo

over 5 years ago

It was great to work with @AmartyaSanyal, @_rockt, and @egrefen at FAIR London. This line of research is fascinating! Thank you for the opportunity! Additional gratitude to @RCalandra for the support and advice.

Edward Grefenstette @egrefen

over 5 years ago

I've been thinking a lot about this work recently, esp. the fascinating ML problems that emerge when you want to solve it without generating doc/env variants. Ongoing work on this with @AmartyaSanyal+@CdrGeo who I had the pleasure of remotely hosting as interns this year. [3/14]

CdrGeo retweeted

Xuedong F.C.J.S Shang @AbsolutSamuel

almost 7 years ago

Matteo Hessel and @OriolVinyalsML giving talks on Deep RL and games at #RLSS2019 @DeepMindAI

Geoffrey Cideron

@CdrGeo

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users