@urieli17 לעד במערכות אסינכורוניות שממקבלות אלפי חישובים יהיה שיהוי, אחרת אנחנו מוגבלים מאוד ביכולות שלנו. זה חוק טבע. במקסימום תנהל את השיהוי במקום שהוא ינהל אותך עם טכניקות של overlapping
@tommiekerssies Maybe the benefit is more indirect: the same network has to serve as encoder and denoiser, so bad latent choices become harder to maintain. But I agree this is different from explicitly making z0 more Gaussian / semantic / easy to flow-match.
@tommiekerssies I mostly experimented with using an existing latent space with nice properties (flux 2) and using q-former/preciver to map them into and from a more diffusible latent space (heavily regularized with kl divergence) without grid structure and more compressed for cheap diffusion
@tommiekerssies Newer VAEs (flux 1 and 2) and RAEs push semantics + Gaussianity with scale
REPA-E’s figures are a nice signal that semantic structure does not emerge cleanly from naive diffusion loss alone.
https://t.co/ozW3ZPqEfq
@tommiekerssies Maybe semantics need some asymmetry / joint latent inference, not just more denoising.
The AE lineage also points this way imo: SD-VAE optimizes compact+reconstructible latents, but the latents are somewhat noisy.
@tommiekerssies One thing I tried to play with: adding a small Q-Former / Perceiver-IO component, hoping to get more latent freedom than the DINOv2/RAE patch-grid bias while staying diffusible. No meaningful results so far, so maybe the constraint is harder than it looks.
@tommiekerssies Been thinking about this a lot. I like the recipes in Self-Flow / UNITE: they push toward shaping representation learning and generation together, instead of freezing a rep encoder and then fighting whatever latent geometry we get.
https://t.co/2YcPFPnDzR https://t.co/XLlNaZj18Y
@tommiekerssies A strong enough model can model many ugly latent distributions, but at finite compute the latent geometry/redundancy matters a lot. We need to make rep features more generative and to remove redundant spatial directions, and train a stronger model with the spare compute.
@tommiekerssies So the issue is not only “high dimension latents are hard”. It’s also: what part of the latent is semantic signal, what part is reconstruction/detail tax, and how much of that detail should the diffusion model model directly vs leave to the decoder.
@tommiekerssies DC-Gen says current spatial latents still have redundancy that can be compressed post-training.
PS-VAE https://t.co/g1ZSSObxEr
DC-Gen https://t.co/Tfg0H7jmc0
Scaling Expert Parallelism across nodes? Compute isn't your bottleneck anymore. The network is.
I wrote a breakdown on why wide-EP serving for MoE models is fundamentally network-bound.
https://t.co/x2qNACrz9V
@TheAhmadOsman There is no reason to change max model len, as cuda graphs are captured batch wise, not sequence wise. The relevant knobs are `--max-num-batched-tokens` and/or `--max-cudagraph-capture-size` and capture mode
1/7
When rewards conflict, what should RL post-training of diffusion models optimize?
In visual generation, objectives are often in tension:
Prompt adherence can conflict with source preservation.
Photorealism can conflict with stylization.
In our new paper, ParetoSlider, we introduce a multi-objective RL framework that trains a single diffusion model for continuous control over competing reward objectives 🧵
@nir_benz זה ככ לא נכון, כל הדוח מראה תכנון ארכיטקטוני שהוא מותאם לlarge scale serving יעיל. בהנחה שבקרוב כבר נראה גרסאות של nvfp4 להריץ אותו על nvl72 יהיה מאוד יעיל ומשתלם, וכל השוואה מול opus או gpt היא בעייתי כי לא ברור כמה מסובסדים הטוקנים שם (אני מנחש שהמודלים יקרים פי 1.5-2 להרצה)
3D editing has long relied on workarounds: per-asset optimization, 2D view propagation, or hacking frozen priors. The bitter lesson is the same one image editing already learned. Train a native model, end-to-end.
Introducing ShapeUP, accepted to SIGGRAPH 2026 💫
CVPR 2026 highlight! 🔥
In this work co-led with @YehezkelShai, we show that a plain diffusion model can solve hard geometry problems by treating them as conditional image generation problems. No special architecture needed.
w/ @OmerDahary, @kusichan, @OPatashnik, @DanielCohenOr1