Fine-tuning just got a whole lot easier.
Serverless SFT is now in public preview on W&B!
Managed infrastructure (powered by @CoreWeave) that auto-scales to your training workloads. No cluster setup. No idle GPU costs.
@Yuchenj_UW The EO is targeting H1B recipient from outside the States, which mostly is IT consultant.
It doesn’t affect and may even boost H1B chance of US college grad.
Starting to see why top AI labs believe ASI is inevitable.
Blending imitation learning (SFT) at different checkpoints with exploration learning (RL) can uncovers new solutions to existing problems and also tackle entirely new unsolved problems.
@goldstein_aa@jxmnop@AlexIrpan@sea_snell Training RL from scratch is hard. DeepSeek's approach builds on a strong base model.
Similar to how college helps build one knowledge base before applying it to solve real-world problems.
A naive approach in training "reasoning" model
1. FineTune instruct model with chain of thought
2. Use best of N to find chain of thought that give the right answer
3. Re fineTune the model with chain of thought that yields the right answer
@volokuleshov@brandondamos I wish I could attend the lecture!
I am working on creating customize chat template serializer for Llama, Mistral and Qwen model.