There was a time when you owned the software you bought.
I play Minecraft most evenings with my son for an hour. Today, the license server was down, and the game wouldn’t start. We just sat there, staring at the launcher, watching it try to verify our license. A game I paid for. Installed locally. All files right there on my hard drive, and yet, I couldn’t open it.
It struck me how subtly things have shifted over the years. You don’t own your tools anymore. You’re renting access—sometimes to your favorite games, sometimes to your productivity apps, even to your creative work. When a server goes down or a company changes its terms, the door quietly closes on something that used to be yours.
It’s convenient, yes. Seamless updates, cloud saves, multiplayer access. But that convenience has a hidden cost: control.
My son just asked, “Why can’t we play if it’s on our computer?” I didn’t have a good answer. Maybe that’s the most telling sign of all.
(1/4) Typical LLM post-training mechanisms have a hard time learning models that can produce diverse responses. To fix this we introduce 𝐃𝐐𝐎 (𝐃𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧), a method for post-training LLMs to generate diverse high-quality responses. Here is the paper link:
https://t.co/Cj6HtWbamY
@ravian_42 So true! It's an absolute masterpiece. Even the DLCs are ridiculously good. "Hearts of Stone" and "Blood and Wine" manage to be perfect in their own different ways. Easily one of the best games I have ever played.
Excited to share that our paper “Provably Robust DPO: Aligning Language Models with Noisy Feedback” has been accepted at #ICML2024!
We introduce Robust DPO, an unbiased estimate of the DPO loss that is robust to preference noise in the data.
https://t.co/2ByitjmzIL
🧵[1/n]
Excited to share that our paper “Provably Robust DPO: Aligning Language Models with Noisy Feedback” has been accepted at #ICML2024!
We introduce Robust DPO, an unbiased estimate of the DPO loss that is robust to preference noise in the data.
https://t.co/2ByitjmzIL
🧵[1/n]
We provide a general framework that can be adapted to other preference optimizations methods (e.g. SLiC, IPO) and other preference models (e.g. Probit, Placket-Luce). We provide first theoretical guarantees for practical preference optimization algorithms.
🧵[6/n]