bagel.com @bageldotcom - Twitter Profile

Pinned Tweet

6 days ago

Today we're releasing Paris 2.0, to our knowledge the first decentralized-trained video generation model. At Bagel Labs, we believe frontier models should not require homogeneous clusters of premium, supply constrained GPUs. Paris 1.0 proved this for image generation. Paris 2.0 extends the recipe to video generation and lays the substrate for global-scale world models. To test the approach, we trained two models head-to-head in an iso-FLOP, iso-data comparison. One was a monolithic model trained conventionally, on a single premium GPU cluster. The other was Paris 2.0, trained across an extreme mix of GPU types, generations, and vendors distributed around the globe. Against the monolithic model under matched data and compute, the results were: FVD: 561.04 → 279.01 (a ~2x improvement) CLIP text-video alignment and aesthetic score both improved. To our knowledge, this is the first distributed training architecture to surpass its monolithic counterpart under matched data and compute. Technical Report: https://t.co/tZXA9KGo7i Model Weights: https://t.co/6PzWs7TTTO

bageldotcom's tweet photo. Today we're releasing Paris 2.0, to our knowledge the first decentralized-trained video generation model.

At Bagel Labs, we believe frontier models should not require homogeneous clusters of premium, supply constrained GPUs. Paris 1.0 proved this for image generation. Paris 2.0 extends the recipe to video generation and lays the substrate for global-scale world models.

To test the approach, we trained two models head-to-head in an iso-FLOP, iso-data comparison. One was a monolithic model trained conventionally, on a single premium GPU cluster. The other was Paris 2.0, trained across an extreme mix of GPU types, generations, and vendors distributed around the globe.

Against the monolithic model under matched data and compute, the results were:
FVD: 561.04 → 279.01 (a ~2x improvement)
CLIP text-video alignment and aesthetic score both improved.

To our knowledge, this is the first distributed training architecture to surpass its monolithic counterpart under matched data and compute.

Technical Report: https://t.co/tZXA9KGo7i
Model Weights: https://t.co/6PzWs7TTTO

bidhan ✈️ CVPR

@bidhan

6 days ago

We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model. We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.

88

657

113

489

441K

4

29

6

15

4K

bageldotcom retweeted

bidhan ✈️ CVPR

@bidhan

about 16 hours ago

heading to @CVPR with the Bagel Labs team. if you want some elite merch (we spent a month designing it), discuss world models, physical ai, distributed training -- I'm your guy 🫡

1

12

2

0

856

bagel.com

@bageldotcom

6 days ago

@bidhan 🥯🚀

0

1

0

417

bageldotcom retweeted

bidhan ✈️ CVPR

@bidhan

6 days ago

We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model. We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.

88

657

113

489

441K

Who to follow

Mona T

@CryptoMonaT

Web3 VC @EdgeVenturesVC | Founder @OpenAlphaVC

mary lulu

@magrando_

o internacional e o phoenix suns me obrigam a beber

Wilson

@Juniorpaess

Grêmio 🇪🇪 / Boston Celtics ☘️

bagel.com

@bageldotcom

about 1 month ago

1

16

6

9

3K

bagel.com

@bageldotcom

3 months ago

@mirimayer 🙌

0

23

bagel.com

@bageldotcom

3 months ago

In town for NVIDIA GTC? If you're building generative world models or investing in the people who are - we're putting the right people in one room for you tomorrow night in Palo Alto. Co-hosted by Alumni Ventures. Signup link below.

5

16

6

4

11K

bageldotcom retweeted

Gin Jiang @CVPR

@ZhiyingJ

3 months ago

we managed to extend DDM to heterogeneous objectives! one step closer to decentralized AI XD tldr: experts trained in complete isolation, with different objectives, no communication - and mixing them beats making them all train the same way. 20-48G memory per expert

0

10

3

2K

bageldotcom retweeted

bidhan ✈️ CVPR

@bidhan

3 months ago

Excited to share that Bagel Labs' paper got accepted at CVPR 2026. A lot of the most important diffusion model research has historically stayed inside frontier labs. We're bringing more of that in the open through open science and open infrastructure. In this work we showcase the very counterintuitive advantage of mixing different training objectives (DDPM and Flow-Matching) through an ensemble of diffusion models. This is one of the first ever works to successfully combine diffusion models trained with heterogeneous objectives. See details here: https://t.co/vLyziWc8CW

bidhan's tweet photo. Excited to share that Bagel Labs' paper got accepted at CVPR 2026.

A lot of the most important diffusion model research has historically stayed inside frontier labs. We're bringing more of that in the open through open science and open infrastructure.

In this work we showcase the very counterintuitive advantage of mixing different training objectives (DDPM and Flow-Matching) through an ensemble of diffusion models. This is one of the first ever works to successfully combine diffusion models trained with heterogeneous objectives.

See details here: https://t.co/vLyziWc8CW

4

21

7

10

5K

bagel.com

@bageldotcom

3 months ago

Diffusion models are becoming the foundation for image, video, and world models. We are hosting a founders and investors gathering on that topic during NVIDIA GTC week, co-hosted by our friends at Alumni Ventures. Mar 16, Menlo Park. Sign up below. https://t.co/epmpbhp6UF

3

17

8

13

6K

bagel.com

@bageldotcom

4 months ago

@deedydas 👋

0

4

0

729

bageldotcom retweeted

bidhan ✈️ CVPR

@bidhan

4 months ago

Being at the frontier - by the definition of it - means creating the frontier. You don't get to be at the frontier by following someone else. And creating the frontier often means discoveries that go against the established knowledge. We recently made such a discovery about distributed diffusion model training. A common way to optimize diffusion model training is by ensuring the numerical stability of their generation paths. We found that that's not true for the most efficient distributed diffusion model training architecture. We shared what works instead in our blogpost below. https://t.co/aVkPne0tJd

bidhan's tweet photo. Being at the frontier - by the definition of it - means creating the frontier. You don't get to be at the frontier by following someone else.

And creating the frontier often means discoveries that go against the established knowledge.

We recently made such a discovery about distributed diffusion model training. A common way to optimize diffusion model training is by ensuring the numerical stability of their generation paths. We found that that's not true for the most efficient distributed diffusion model training architecture.

We shared what works instead in our blogpost below.

https://t.co/aVkPne0tJd

14

446

33

333

52K

bageldotcom retweeted

Deep-ML

@real_deep_ml

4 months ago

Can't wait for the live stream! in honor of it we just added a new question and interactive playground based on the paper

3

36

7

18

7K

bageldotcom retweeted

Yacine Mahdid

@yacinelearning

4 months ago

alright folks tomorrow january 22 from 10h-12h AM EST we're going to dive into decentralized diffusion models by reviewing the paris model from bagel labs I even managed to lock in @bidhan for an interview on why how what that's a good thing to even do tune in!

yacinelearning's tweet photo. alright folks tomorrow january 22 from 10h-12h AM EST we're going to dive into decentralized diffusion models by reviewing the paris model from bagel labs

I even managed to lock in @bidhan for an interview on why how what that's a good thing to even do

tune in! https://t.co/Cg41Dyqyq7

3

90

8

37

27K

bageldotcom retweeted

mirian

@mirimayer

5 months ago

no better way to celebrate bagel day 🥯

4

14

3

0

2K

bageldotcom retweeted

bidhan ✈️ CVPR

@bidhan

6 months ago

NeurIPS takeways (better late than never) 1. real AGI needs real continual learning - models that can keep learning without catastrophic forgetting. 2. model architectures need to be "stateful" for building accurate world models for games and robotics. 3. diffusion models are superior for solving both 1 & 2. 4. @bageldotcom's distributed diffusion training architecture is SOTA among both open and closed source frontier lab comparables. 5. the age of research is back, and no better place to do frontier diffusion model research than Bagel Labs. join us - https://t.co/zweESweTOF

bidhan's tweet photo. NeurIPS takeways (better late than never)

1. real AGI needs real continual learning - models that can keep learning without catastrophic forgetting.

2. model architectures need to be "stateful" for building accurate world models for games and robotics.

3. diffusion models are superior for solving both 1 & 2.

4. @bageldotcom's distributed diffusion training architecture is SOTA among both open and closed source frontier lab comparables.

5. the age of research is back, and no better place to do frontier diffusion model research than Bagel Labs. join us - https://t.co/zweESweTOF

7

54

8

28

7K

bageldotcom retweeted

nisten🇨🇦e/acc

@nisten

6 months ago

cooking some stuff in distributed diffusion, theres quite a ot of unturned stones left local and network inference when the architechture is made for it. (also we're training a video MoE)

2

19

4

2

4K

bageldotcom retweeted