Happy to introduce our new paper "Diversity-Rewarded CFG Distillation".
We combine distillation, a novel diversity reward, and model merging to improve the quality-diversity tradeoff of MusicLM.
arxiv: https://t.co/jwkyquIW4z
More info:
An AI will win a Nobel price someday✨. Yet currently, alignment reduces creativity. Our new @GoogleDeepMind paper "diversity-rewarded CFG distillation" improves quality AND diversity for music, via distillation of test-time compute, RL with a diversity reward, and model merging.
arxiv: https://t.co/7wiXHNr2uW
website: https://t.co/YVUPsmVhPR
I am so proud to see Gemma released today! I have had a fantastic time working on post-training and RLHF with an amazing team. Cannot wait to see what the community builds with these models!
Online feedback is crucial for alignment, so we propose a simple recipe to make any direct alignment method (think DPO / IPO / SLiC-HF) online using AI feedback 🧙♂️
In human evals, online methods yield on avg 66% wins, 28% ties and 6% losses vs offline methods (on TL;DR) 👀
Google presents MusicRL
Aligning Music Generation to Human Preferences
paper page: https://t.co/FL4jDRdXpi
propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models.
Very proud of the work done by @CdrGeo , one of my last projects at Google. When we released MusicLM in May ’23, we incorporated a feedback system to realize the first ever large-scale, organic improvement of music generation through RLHF. 🎶🧵
Our #ACL2023 paper "Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback" is now on arXiv!
tl;dr - we improve the factuality of summaries via RL, without human feedback!
📜 https://t.co/KCpasvtlRM
Thread (1/10) 👇
We* are looking for a Student Researcher** to work with us on a project in intersection of modeling/generating speech/audio, NLP, and representation learning.
*AudioLM team @ Google Research (@zalanborsos, @neilzegh, myself and many others!)
**not-last-year PhD student
A common belief is that text auto encoders produce badly structured latent spaces with holes. We were surprised to find that using round-trip translations (e.g. en->de->en) one can obtain nicely structured latent spaces. Check out https://t.co/5fCahGZZ9i.
Excited to announce that our #AAMAS2022 paper "Lazy-MDPs: Towards Interpretable RL by Learning When to Act" is on arXiv! 🦥
tl;dr - we introduce lazy-MDPs, modified MDPs that allow agents to defer decision-making to a third-party policy
📜 https://t.co/Tsd2d8RPCc
🧵👇
It was great to work with @AmartyaSanyal, @_rockt, and @egrefen at FAIR London. This line of research is fascinating! Thank you for the opportunity! Additional gratitude to @RCalandra for the support and advice.
I've been thinking a lot about this work recently, esp. the fascinating ML problems that emerge when you want to solve it without generating doc/env variants. Ongoing work on this with @AmartyaSanyal+@CdrGeo who I had the pleasure of remotely hosting as interns this year. [3/14]