@torchcompiled Can't we always get an equivalent x0? In other words as long as there's a predicted eps, using the forward/noising process eqn, we can get an predicted x0.
Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released:
- Welcome video
- Lecture 1: Overview of RLHF & Post-training
- Lecture 2: IFT, Reward Models, Rejection Sampling
- Lecture 3: RL Math
- Lecture 4: RL Implementation
I'm going to add question & answer videos throughout the lecture to go deeper on topics that need it, and potentially cover some topics that are too recent and in flux to go in print. I expect 10-15 videos in total over the next few months.
At the same time, development around the code for the book is picking up. It's a great time to build the foundation for post-training methods.
YT playlist and course landing page below.
🤯 big update to our flow map language models paper!
we believe this is the future of non-autoregressive text generation.
read about it in the blog: https://t.co/DfBXrYmJc8
full details in the paper: https://t.co/coiNXj4ucC
we introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation.
we beat all discrete diffusion baselines at ~8x speed!
v2 gives a complete theory of the flow map over discrete data, with three equivalent ways to learn it (semigroup, lagrangian, eulerian). it turns out you can train these with cross-entropy objectives that look very similar to standard discrete diffusion — but without the factorization error that kills discrete methods at few steps.
beyond improving results across the board, we showcase properties that are unique to continuous flows. in particular, inference-time steering and guidance become straightforward. autoguidance brings generative perplexity down to 51.6 on LM1B, while discrete baselines completely collapse at the same guidance scale.
we also show reward-guided generation for steering topic, sentiment, grammaticality, and safety at inference time — and it works even at 1-2 steps with our flow map model. simple, well-understood techniques from continuous flows just work incredibly well in practice for language.
we’re extremely excited about the future of this class of models.
stay tuned for results on scaling, reasoning, and reinforcement learning-based fine-tuning. 🚀
New tutorial paper on the “Foundations of Schrödinger Bridges for Generative Modeling” is out on arXiv! 🧩
📖 arXiv: https://t.co/ce4feGdXZT
🔮 Project Website: https://t.co/dyNr5TRijq
With 220 pages and 24 figures, this guide builds the theoretical foundations of Schrödinger bridges from the ground up, unifying the broad field of generative modeling with a single guiding principle: construct an optimal stochastic bridge between distributions while minimizing deviation from a reference process.
The rapid progress in generative modeling has made the field increasingly difficult to navigate from a foundational perspective, which motivated me to develop a resource that builds the core concepts needed to understand and contribute to new advances.
This guide contains intuitive explanations and step-by-step proofs covering:
🧩 The dynamic Schrödinger bridge formulation, lifting optimal transport to continuous-time stochastic processes between distributions, with direct connections to diffusion models, score-based methods, and flow matching.
🧩 A comprehensive toolkit for constructing Schrödinger bridges from first principles, describing stochastic optimal control, forward–backward SDEs, Doob’s h-transform, and Markov and reciprocal projections.
🧩 Extensions to complex and real-world problem settings, including the multi-marginal, unbalanced, discrete SB problems, highlighting the flexibility of the Schrödinger bridge framework in describing complex dynamical systems.
🧩 Practical, scalable algorithms for training and inference of dynamic Schrödinger bridges across modern generative modeling tasks.
More details in the thread 👇🏻
Video models != world models
"We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism"
Fantastic post by Colin Raffel, "We Are Over-Indexing on Paper Acceptance," drafted in May 2021 (!) but only posted now. The more things change..
Last sentence: "If you want to judge a researcher’s quality, the only meaningful way is to read their papers and judge for yourself."
Swiss AI Visiting PhD Program at EPFL. Deadline Feb 28.
"The program provides a fellowship contribution of CHF 2,500 ($3200) per month, access to the Alps supercomputer, and eligibility for a post-visit continuation grant of up to 50k GPU hours. The call is open to PhD students enrolled outside Switzerland, with applications supported by EPFL PIs contributing to the Swiss AI Initiative. The application closes on February 28."
https://t.co/ZlB7Mm1SFj
@ICepfl@EPFL_AI_Center
GeMSS 2026 will come to London from Mar 23 - Mar 27, apply by Jan 23
- imho no.1 research school in Europe on generative models
- 3-day crash course on foundations
- 2-day frontier talks (diffusion, LLM for math, AI4Science, etc.)
@jesfrellsen@pamattei@jmtomczak@bguedj
@isskoro Learning the structure seems crucial but when training pixel level models wouldn't it also make sense to spend time/model capacity on lower noise levels so as to generate sharper images since we don't have an adversarially trained decoder?
🎁 Christmas gift: let’s learn deep generative models through 101 papers!
Advanced Topics of Deep Generative Models: a hybrid of lectures + student talks, spanning foundations to applications.
Today, the slides are public: https://t.co/shWPV5CHUc
Huge thanks to @xxunhuang@YizheZhangNLP@du_yilun--our guest speakers for sharing their latest research and perspectives!
This is also the first full course I’ve ever taught! Super grateful to learn alongside amazing students at Penn, and deeply thankful to our TAs for all their hard work! @TongMutianTMT@tyao923
Performance Hints
Over the years, my colleague Sanjay Ghemawat and I have done a fair bit of diving into performance tuning of various pieces of code. We wrote an internal Performance Hints document a couple of years ago as a way of identifying some general principles and we've recently published a version of it externally.
We'd love any feedback you might have!
Read the full doc at: https://t.co/jej95g236P
Heading to neurips soon!
I'm on the job market and I work on making AI reliable, with a focus on healthcare.
Would love to meet people and hear about opportunities!
@itsbautistam@StefanABaumann Sounds v reasonable. Though a counter argument I have here is that image distributions live on manifolds where all degrees of freedom can be captured using low dim latents but noise distributions occupy ambient space so compression might be lossy.
🔥 WANTED: Student Researcher to join me,@ValentinDeBort1,@thjashin,@liwenliang,@ArthurGretton in DeepMind London.
You'll be working on Multimodal Diffusions for science. Apply here https://t.co/owR6KoCQII