epson755 @epson0755 - Twitter Profile

about 1 month ago

@vincentweisser @mikasenghaas When will people understand the hard part is never synthesizing envs, but a reliable verifier. Bulk synthesized RL envs without reliable verifiers only gives misleading rewards to your models and waste compute.

0

87

epson755 @epson0755

about 1 month ago

@VibeCoderOfek @PrimeIntellect Exactly this ^ When will people understand a reliable verifier is all there is to it, instead of bulk synthesized RL env slops, which only gives misleading rewards to confuse your models.

0

1

0

28