Why do 16k GPU jobs fail?
The Llama3 paper has many cool details -- but notably, has a huge infrastructure section that covers how we parallelize, keep things reliable, etc.
We hit an overall 90% effective-training-time.
https://t.co/5gngOZJHBO
It is only rarely that, after reading a research paper, I feel like giving the authors a standing ovation. But I felt that way after finishing Direct Preference Optimization (DPO) by @rm_rafailov@archit_sharma97@ericmitchellai@StefanoErmon@chrmanning and @chelseabfinn. This beautiful paper proposes a much simpler alternative to RLHF (reinforcement learning from human feedback) for aligning language models to human preferences.
RLHF has been a key technique for training LLMs. In brief, RLHF (i) Gets humans to specify their preferences by ranking LLM outputs, (ii) Trains a reward model (used to score LLM outputs) -- typically represented using a transformer network -- to be consistent with the human rankings, (iii) Uses reinforcement learning to tune an LLM, also represented as a transformer, to maximize rewards. This requires two transformer networks, and RLHF is also finicky to the choice of hyperparameters.
DPO simplifies the whole thing. Via clever mathematical insight, the authors show that given an LLM, there is a specific reward function for which that LLM is optimal. DPO then trains the LLM directly to make the reward function (that’s now implicitly defined by the LLM) consistent with the human rankings. So you no longer need to deal with a separately represented reward function, and you can train the LLM directly to optimize the same objective as RLHF.
Although it’s still too early to be sure, I am cautiously optimistic that DPO will have a huge impact on LLMs and beyond in the next few years.
You can read the paper here: https://t.co/m14qRYszVa I also write more about this in The Batch (linked to below).
https://t.co/8h2ag2plIa
You love using PyTorch for Deep Learning but want it a bit more organized, so it's easier to take advantage of more advanced features?
Great news: Unit 5 is finally live! In Unit 5, I'll show you how to train PyTorch models with the Lightning Trainer!
🔗 https://t.co/MJB1pr3npi
1/3 As a researcher from a non-traditional background, I take attribution very seriously and embrace constructive scientific discussion that accelerates learning and advances progress. Flash was developed to serve the evolving needs of PyTorch Lightning users.