@burkov Pretty insightful, I also think a lot of RL training is focused on straight forward tasks where there is a defined problem and solution. Meanwhile a lot of these real world tasks involve a ton of unknowns/required trial and error, and this process goes well outside the training.
@cuemewch If you have central AC but aren't getting enough cold air in your room I'd try out a register booster, they work pretty alright.
https://t.co/2jMlIXOHAN
Today, we release QwQ-32B, our new reasoning model with only 32 billion parameters that rivals cutting-edge reasoning model, e.g., DeepSeek-R1.
Blog: https://t.co/jpNEx0Ck8p
HF: https://t.co/h91przQmoP
ModelScope: https://t.co/p0ztmZpWIZ
Demo: https://t.co/sxVVRFwunC
Qwen Chat: https://t.co/bg4tAU1p74
This time, we investigate recipes for scaling RL and have achieved some impressive results based on our Qwen2.5-32B. We find that RL training con continuously improve the performance especially in math and coding, and we observe that the continous scaling of RL can help a medium-size model achieve competitieve performance against gigantic MoE model. Feel free to chat with our new models and provide us feedback!