We were all wondering whether Categorical Flow Maps (CFMs) could scale... 🤔
I couldn't help trying it out...
So we scaled CFMs to 1.7B parameters over 2.1T tokens 🚀🔥
Short summary 🧵⬇️
Introducing Strong Stochastic Flow Maps
TLDR: Stochastic Flow Maps where we learn the stochastic solution path.
Work led by Sam McCallum, @zwblasingame, with Timothy Herschelll, @AlexanderTong7, and @JamesFosterBath
Arxiv: https://t.co/Hy8WWZOnjE
Code: https://t.co/PMe6RoqyZA
Over the weekend, I was using codex to update my homepage and a paper I wrote a year ago on the topic of diffusion LLMs (should be updated on Monday).
https://t.co/qvqldZ9H1w
While I did not want to make it too explicit back then, I have argued that discrete diffusion LLMs were not the right thing to do and if diffusion ever works on LLMs continuous dLLMs are the way to go.
A year later, we are seeing a lot cool papers in this space, and I hope the community can push for something practical and scalable.
Can we guide flow models in just a few steps? 🚀
Flow-based sampling is rapidly moving toward few-step generation. But reward guidance often still requires many steps and costly test-time search.
Excited to introduce Flow Map Reward Guidance (FMRG): a training-free framework for few-step guidance with flow maps.
FMRG matches or surpasses strong baselines on inverse problems and reward-guided text-to-image generation with:
⚡ as few as 3 NFEs
⚡ up to 10× fewer NFEs on inverse problems
⚡ up to 70× fewer NFEs on reward-guided generation
🧵⬇️
We were all wondering whether Categorical Flow Maps (CFMs) could scale... 🤔
I couldn't help trying it out...
So we scaled CFMs to 1.7B parameters over 2.1T tokens 🚀🔥
Short summary 🧵⬇️
[📄preprint] Diffusion models 🤝 MCMC !
Diffusion model samplers are biased due to discretisation
💡The fix: Metropolis-type adjustment on corrector steps
❗️Challenge: no access to the density ratio, only the score
🔑Insight: the score (and some maths) is all you need...
[1/3]
Guide with examples, not rewards 🐘
Controlling what a pretrained generative model produces is still mostly a choice between three slow options: fine-tune it, attach a reward network, or search at inference. We found flow matching allows a fourth, and it costs almost nothing.
In deterministic interpolants, the velocity of the flow is determined by where the trajectory is headed: the endpoint mean. Shift that mean, and the entire flow shifts with it.
This turns control into a matter of reference. Change the examples that define the endpoint, and you change the direction the model follows. The examples need not be perfect. They only need to point the flow toward the attribute you want.
Color, identity, style, and structure, all controllable through examples. 🧵👇
Very excited about our work on finding the right drifting direction 🐎
We tackle a core open question in drifting: when does “no drift left” mean the model really matched the data?
Kernel-gradient drifting is the answer (with natural extensions to manifolds + discrete data)!