@arthurmensch All day I argue with the conspiracy theorists and the fat-shamers that deny the existence of Le Chaton Fat 🐱 (yes denying its existence is the ultimate form of fat-shaming)
VibeThinker-3B, post-trained upon Qwen2.5-Coder-3B base, scored 94.3 on AIME26, with a performance similar to DeepSeek V3.2, GLM-5, and Gemini 3 Pro.
Small models are the future for agents because they can use tools to get the knowledge they lack and they can run fast and cheap.
Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5-Coder. The paper doesn't provide many details, but it appears they distill from RL ckpts and then do a final RL-based instruct RL.
🔗https://t.co/FmdRwGNMOg
The haters want to ban Le Chaton Fat, but they don't understand it's futile because Mistral implemented Recursive Self Fattening, and nothing can stop the exponential.
@techmeditator it's not just fat, but it's getting fatter through Recursive Self Fattening. The details of the RSF implementation are described on the paper.