@karpathy imports in the function also..
a surprising fraction of people argue that exceptions in Python are hard to read and that user facing errors should log then exit, vs raise exceptions.
⚙️ Looking closer into GRPO: there is a "clipping bias" that amplifies high-prior model behaviors.
Code reasoning could be one of the magical behaviors for Qwen-Math💻
Empirically, we disabled clipping (fig.)-the gains disappeared‼️
@oh_that_hat I think a leg doesn't have memory of it's own internal state, & instead at best has residual internal effects of a previous internal state (which is different & I don't think I would qualify as consciousness)
@oh_that_hat and not so much a computation of the current state of its senses + its own internal previous computational state. it's sensory memory vs memory/ understanding of a level of of its computational state.
DeepSeek makes it quite clear how they trained R1.
None of these steps alone are super surprising, but how to sequence and blend them together definitely is.
@natolambert I also wonder if you leave some performance on the table with RLVR because of this, where some of the learning potential on the data is wasted on trying to deal with the prompting format
1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%.