FUN to share that Sync-LoRA has been accepted to #ECCV2026! 🤗✨🎬
the field moves fast, but the core seems to remain:
1️⃣ IC-LoRA enables diverse edits
2️⃣ high-quality data is key (even if it's a small amount!)
Huge thanks to @arash_mham@OPatashnik@DanielCohenOr1
Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space.
Now it is 0.75, and can be even lower.
Many wonder how.
I thought it might end as a small FID prank: simple and deliberate.
It started with one question: can FID be optimized directly, and what does it reveal?
Introducing FD-loss.
Excited to share our work accepted to #SIGGRAPH2026 ! Video generation models struggle with something few talk about: their transformations don't evolve smoothly. You get long boring stretches... then a sudden semantic jump where everything "catches up" at once.
1/7
When rewards conflict, what should RL post-training of diffusion models optimize?
In visual generation, objectives are often in tension:
Prompt adherence can conflict with source preservation.
Photorealism can conflict with stylization.
In our new paper, ParetoSlider, we introduce a multi-objective RL framework that trains a single diffusion model for continuous control over competing reward objectives 🧵
I'll be presenting NoisePrints today at ICLR! 🇧🇷
We exploit the remarkable correlation between a diffusion model's initial random noise and its output for robust and efficient watermarking. Come visit Pavilion 4, poster #3013 at 3:15 PM to hear more and discuss it!
#ICLR
The initial noise in diffusion models is surprisingly correlated with the final image.
Our NoisePrints paper exploits this to provide a lightweight, distortion-free, cryptographically secure watermark for proving authorship of generated images & videos, requiring no model access.
We developed a simple, sample-efficient online RL technique for post-training image generation models. We see it as a possible steerable alternative to CFG, driven by any scalar reward, including human preference.
CVPR 2026 highlight! 🔥
In this work co-led with @YehezkelShai, we show that a plain diffusion model can solve hard geometry problems by treating them as conditional image generation problems. No special architecture needed.
w/ @OmerDahary, @kusichan, @OPatashnik, @DanielCohenOr1
Visual Diffusion Models are Geometric Solvers
We cast geometry as images: a plain diffusion model denoises into valid solutions. It is simple, general and effective.
Shown on Inscribed Square, Steiner Tree, and Maximum Area Polygonization - all classic hard problems.
[1/5] Is Text Enough for Control? 🐇
Text-driven video editing lets you describe *what* to change. But what about *how much*?
We introduce Adaptive-Origin Guidance (AdaOr).
A joint work with @DecartAI and @TelAvivUni 🧪
accepted to #SIGGRAPH2026.
Video models as Physics simulators. 🌍🎥
[1/] In our latest work, WinDiNet, we finetuned a pre-trained video model into a differentiable physics engine. 1000x faster than traditional CFD solvers.
Project page: https://t.co/LAx7t00y3e
Abs: https://t.co/OdcgbKeQEG
Our previous intern released an extremely impressive re-implemented demo of our paper on multiplayer diffusion game engines.
https://t.co/UHUEVfkK8h
I think this might be the first time you can play a fully-functional multiplayer generative game online with other people. 🤯
Modern T2I DiTs are incredibly powerful, but have a serious diversity problem.
We introduce a surprisingly simple and efficient inference-time fix
(+2s for Flux-dev, +1s for SD3.5-Turbo).
Excited to share our SIGGRAPH 2026 (conditional) paper:
“On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers”
Excited to share that our work, “Image Generation from Contextually Contradictory Prompts,” was accepted to CVPR 2026 🎉
Diffusion models fail on some seemingly simple prompts.
Why do they ignore what you asked for?
We show why, and how to fix it with a simple, training-free method.
Joint work with @OPatashnik@OmerDahary@MokadyRon@DanielCohenOr1
DLSS 5 is all over the timeline, and for good reason. In my internship at @AIatMeta we had the same idea: use a video model as a learned second-stage renderer on top of game engines. In our paper RealMaster, we make synthetic video look real while preserving scene fidelity 👇
@PolaczekSagi btw, you can set up ~/.claude/settings.json for a terminal bell on completion/permission request that works over claude code in tmux (which you can turn into a system notification on iTerm2 when not in focus):
LTX-2.3 is here.
For decades, creative software has been defined by its interface. We think the next era gets defined by the engine underneath.
LTX-2.3 is a major engine upgrade:
→ Sharper detail
→ Stronger motion
→ Cleaner audio
→ Native vertical format