New paper!
Want to precisely optimize synthetic training data to do practical or even wacky things?
Dataset Policy Gradients get you there, letting you target any differentiable training or post-training metric. We embedded a QR code in GPT-2โs weights using only training data!