@vincentweisser@mikasenghaas When will people understand the hard part is never synthesizing envs, but a reliable verifier. Bulk synthesized RL envs without reliable verifiers only gives misleading rewards to your models and waste compute.
@VibeCoderOfek@PrimeIntellect Exactly this ^
When will people understand a reliable verifier is all there is to it, instead of bulk synthesized RL env slops, which only gives misleading rewards to confuse your models.