Why pay full compute for pixels you're not even looking at?
In our new work, Foveated Diffusion, we introduce a new concept for efficient image and video generation, motivated by how the human visual system works.
(See full thread below)
New research from @bfl_ml ๐ฅณ
Meet Self-Flow: our self-supervised framework for image, audio, video & world models ๐ค
https://t.co/AshY8IkSEe
Do generative models really need DINO to learn strong representations? We propose teaching them directly via a joint framework instead ๐งต
[3/3]
We systematically study the key modeling/training/sampling knobs and share practical guidance for better quality โ and faster generation โกโbacked by a large-scale sweep of 56 pretrained models and 549 evaluations to map the design space. ๐
[2/3]
In our previous paper, Transition Matching: Scalable and Flexible Generative Modeling (https://t.co/1atqaxcns8), we introduced transition matchingโa new generative paradigm. This follow-up goes beyond the concept and asks: which design choices actually matter? ๐
๐๐ฌWe introduce TMD (Transition Matching Distillation): 480p videos generated from text prompts in < 3 NFEs!
1๏ธโฃMain backbone for feature extraction and lightweight head for iterative refinement
2๏ธโฃDistilled from Wan2.1 14B T2V combining MeanFlow & DMD2
๐https://t.co/o4VyCBl3mJ
After multiple requests for the code of the visuals from my talk about Transition Matching, I made a notebook that reproduces the DTM vs. FM GIF!
This demo is a good way to build intuition on how TM and FM differ.
https://t.co/Qh3W9nfDrr
@urielsinger
New work: โGLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Modelsโ. GLASS generates images by sampling stochastic Markov transitions with ODEs - allowing us to boost text-image alignment for large-scale models at inference time!
https://t.co/unsuG3mYer
[1/7]
Excited to share our work Set Block Decoding!
A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache.
Arguably one of the simplest ways to accelerate LLMs!
DTM vs FM๐
Lots of interest in how Difference Transition Matching (DTM) connects to Flow Matching (FM).
Here is a short animation that illustrates Theorem 1 in our paper:
For a very small step size (1/T), DTM converges to an Euler step of FM.
If you're curious to dive deeper into Transition Matching (TM)โจ๐, a great starting point is understanding the similarities and differences between ๐๐ข๐๐๐๐ซ๐๐ง๐๐ ๐๐ซ๐๐ง๐ฌ๐ข๐ญ๐ข๐จ๐ง ๐๐๐ญ๐๐ก๐ข๐ง๐ (๐๐๐) and Flow Matching (FM)๐ก.
This paper is awesome.
๐ฅ Flow-matching for flow-matching!
โNo more coarse-to-fine generation.
๐Coarse and fine details emerge together during generation.
๐Results look super promising, especially when you see how the images evolve.
Difference Transition Matching (DTM) process is so simple to Illustrate, you can calculate it on a whiteboard!
At each step:
Draw all lines connecting source and target (shaded)
โฌ๏ธ
List those intersecting with the current state (yellow)
โฌ๏ธ
Sample a line from the list (green)
Introducing Transition Matching (TM) โ a new generative paradigm that unifies Flow Matching and autoregressive models into one framework, boosting both quality and speed!
Thank you for the great collaboration @shaulneta@itai_gat@lipmanya
[1/n]
New paper alert! ๐
Excited to introduce ๐๐ซ๐๐ง๐ฌ๐ข๐ญ๐ข๐จ๐ง ๐๐๐ญ๐๐ก๐ข๐ง๐ (๐๐)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model๐คฏ, achieving SOTA text-2-image generation!
@urielsinger@itai_gat@lipmanya
**Transition Matching** is a new iterative generative paradigm using Flow Matching or AR models to transition between generation intermediate states, leading to an improved generation quality and speed!
Exciting news from #ICML2025 & #ICCV2025 ๐ฅณ
- ๐ฅ VideoJAM accepted as *oral* at #ICML2025 (top 1%)
- Two talks at #ICCV2025
โ๏ธinterpretability in the generative era
โ๏ธvideo customization
- Organizing two #ICCV2025 workshops
โ๏ธstructural priors for vision
โ๏ธlong video gen
๐งต๐
Excited to share our recent work on corrector sampling in language models! A new sampling method that mitigates error accumulation by iteratively revisiting tokens in a window of previously generated text.
With: @shaulneta@urielsinger@lipmanya
Link: https://t.co/54etkhxNEK