@tak3sh8 eg if there are tough problems where a model fails to get it right, but it once does, then you could for example splice that transition point inside the successful reasoning trace back into all the failed ones, so as to try to bias downwards the failures overall. etc etc.
@tak3sh8 grpo is surprisingly uninspired. the frontier edge probably does come from stability and data/feedback mix. eg finding hacky ways to automate/scale recycling existing on-policy outputs into better, higher reward, more tailored synthetic data. but that can mean almost anything.
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy
I GOT THE DOMAIN! I FINALLY GOT IT!!!!!!!!!!1 🥳🎉
Paint.NET is now at https://t.co/ZJTUII4bVG!
Well, it will be just as soon as I push all the buttons to migrate content and set up redirects from getpaint.net etc. For now it's just a "hey go here" redirect page.