Ariel Bereslavsky @ArielBAMath - Twitter Profile

Pinned Tweet

Ariel Bereslavsky @ArielBAMath

over 2 years ago

אין לי מספיק אישיות לציוץ נעוץ

1

7

0

966

Ariel Bereslavsky @ArielBAMath

about 14 hours ago

@Zevlevys לך למקווה

0

1

0

17

Ariel Bereslavsky @ArielBAMath

13 days ago

@GallilMaimon @adiyossLC Congratulations! 🎊🎉🎊🎉🎊

0

1

0

125

Ariel Bereslavsky @ArielBAMath

13 days ago

@urieli17 לעד במערכות אסינכורוניות שממקבלות אלפי חישובים יהיה שיהוי, אחרת אנחנו מוגבלים מאוד ביכולות שלנו. זה חוק טבע. במקסימום תנהל את השיהוי במקום שהוא ינהל אותך עם טכניקות של overlapping

1

0

16

Who to follow

சேது அண்ணா !! 😘❣️❣️ | அன்பே சிவம் | Love is god | தமிழ் ❤️

29 Lettered Name

@Ahowfull

Don't consume on an empty stomach.

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies Maybe the benefit is more indirect: the same network has to serve as encoder and denoiser, so bad latent choices become harder to maintain. But I agree this is different from explicitly making z0 more Gaussian / semantic / easy to flow-match.

0

28

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies I mostly experimented with using an existing latent space with nice properties (flux 2) and using q-former/preciver to map them into and from a more diffusible latent space (heavily regularized with kl divergence) without grid structure and more compressed for cheap diffusion

0

45

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies Newer VAEs (flux 1 and 2) and RAEs push semantics + Gaussianity with scale REPA-E’s figures are a nice signal that semantic structure does not emerge cleanly from naive diffusion loss alone. https://t.co/ozW3ZPqEfq

0

1

0

37

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies Maybe semantics need some asymmetry / joint latent inference, not just more denoising. The AE lineage also points this way imo: SD-VAE optimizes compact+reconstructible latents, but the latents are somewhat noisy.

1

0

38

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies One thing I tried to play with: adding a small Q-Former / Perceiver-IO component, hoping to get more latent freedom than the DINOv2/RAE patch-grid bias while staying diffusible. No meaningful results so far, so maybe the constraint is harder than it looks.

1

0

53

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies Been thinking about this a lot. I like the recipes in Self-Flow / UNITE: they push toward shaping representation learning and generation together, instead of freezing a rep encoder and then fighting whatever latent geometry we get. https://t.co/2YcPFPnDzR https://t.co/XLlNaZj18Y

3

1

0

68

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies A strong enough model can model many ugly latent distributions, but at finite compute the latent geometry/redundancy matters a lot. We need to make rep features more generative and to remove redundant spatial directions, and train a stronger model with the spare compute.

1

0

53

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies So the issue is not only “high dimension latents are hard”. It’s also: what part of the latent is semantic signal, what part is reconstruction/detail tax, and how much of that detail should the diffusion model model directly vs leave to the decoder.

1

0

46

Ariel Bereslavsky @ArielBAMath

about 1 month ago

@tommiekerssies DC-Gen says current spatial latents still have redundancy that can be compressed post-training. PS-VAE https://t.co/g1ZSSObxEr DC-Gen https://t.co/Tfg0H7jmc0

1

0

50

Ariel Bereslavsky @ArielBAMath

about 2 months ago

Scaling Expert Parallelism across nodes? Compute isn't your bottleneck anymore. The network is. I wrote a breakdown on why wide-EP serving for MoE models is fundamentally network-bound. https://t.co/x2qNACrz9V

1

3

0

67

Ariel Bereslavsky @ArielBAMath

about 2 months ago

@TheAhmadOsman There is no reason to change max model len, as cuda graphs are captured batch wise, not sequence wise. The relevant knobs are `--max-num-batched-tokens` and/or `--max-cudagraph-capture-size` and capture mode

1

0

1

104

ArielBAMath retweeted

Shelly Golan @Shelly_Golan1

about 2 months ago

1/7 When rewards conflict, what should RL post-training of diffusion models optimize? In visual generation, objectives are often in tension: Prompt adherence can conflict with source preservation. Photorealism can conflict with stylization. In our new paper, ParetoSlider, we introduce a multi-objective RL framework that trains a single diffusion model for continuous control over competing reward objectives 🧵

4

84

28

38

11K

Ariel Bereslavsky @ArielBAMath

about 2 months ago

@nir_benz זה ככ לא נכון, כל הדוח מראה תכנון ארכיטקטוני שהוא מותאם לlarge scale serving יעיל. בהנחה שבקרוב כבר נראה גרסאות של nvfp4 להריץ אותו על nvl72 יהיה מאוד יעיל ומשתלם, וכל השוואה מול opus או gpt היא בעייתי כי לא ברור כמה מסובסדים הטוקנים שם (אני מנחש שהמודלים יקרים פי 1.5-2 להרצה)

0

9

ArielBAMath retweeted

Inbar Gat @Gatinbar

2 months ago

3D editing has long relied on workarounds: per-asset optimization, 2D view propagation, or hacking frozen priors. The bitter lesson is the same one image editing already learned. Train a native model, end-to-end. Introducing ShapeUP, accepted to SIGGRAPH 2026 💫

5

245

39

246

19K

ArielBAMath retweeted

Nir Goren @nirgoren

2 months ago

CVPR 2026 highlight! 🔥 In this work co-led with @YehezkelShai, we show that a plain diffusion model can solve hard geometry problems by treating them as conditional image generation problems. No special architecture needed. w/ @OmerDahary, @kusichan, @OPatashnik, @DanielCohenOr1

4

81

16

32

10K

Ariel Bereslavsky

@ArielBAMath

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users