Today we're releasing Paris 2.0, to our knowledge the first decentralized-trained video generation model.
At Bagel Labs, we believe frontier models should not require homogeneous clusters of premium, supply constrained GPUs. Paris 1.0 proved this for image generation. Paris 2.0 extends the recipe to video generation and lays the substrate for global-scale world models.
To test the approach, we trained two models head-to-head in an iso-FLOP, iso-data comparison. One was a monolithic model trained conventionally, on a single premium GPU cluster. The other was Paris 2.0, trained across an extreme mix of GPU types, generations, and vendors distributed around the globe.
Against the monolithic model under matched data and compute, the results were:
FVD: 561.04 → 279.01 (a ~2x improvement)
CLIP text-video alignment and aesthetic score both improved.
To our knowledge, this is the first distributed training architecture to surpass its monolithic counterpart under matched data and compute.
Technical Report: https://t.co/tZXA9KGo7i
Model Weights: https://t.co/6PzWs7TTTO
We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model.
We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.
heading to @CVPR with the Bagel Labs team. if you want some elite merch (we spent a month designing it), discuss world models, physical ai, distributed training -- I'm your guy 🫡
We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model.
We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.
In town for NVIDIA GTC?
If you're building generative world models or investing in the people who are - we're putting the right people in one room for you tomorrow night in Palo Alto.
Co-hosted by Alumni Ventures. Signup link below.
we managed to extend DDM to heterogeneous objectives! one step closer to decentralized AI XD
tldr: experts trained in complete isolation, with different objectives, no communication - and mixing them beats making them all train the same way. 20-48G memory per expert
Excited to share that Bagel Labs' paper got accepted at CVPR 2026.
A lot of the most important diffusion model research has historically stayed inside frontier labs. We're bringing more of that in the open through open science and open infrastructure.
In this work we showcase the very counterintuitive advantage of mixing different training objectives (DDPM and Flow-Matching) through an ensemble of diffusion models. This is one of the first ever works to successfully combine diffusion models trained with heterogeneous objectives.
See details here: https://t.co/vLyziWc8CW
Diffusion models are becoming the foundation for image, video, and world models.
We are hosting a founders and investors gathering on that topic during NVIDIA GTC week, co-hosted by our friends at Alumni Ventures.
Mar 16, Menlo Park. Sign up below.
https://t.co/epmpbhp6UF
Being at the frontier - by the definition of it - means creating the frontier. You don't get to be at the frontier by following someone else.
And creating the frontier often means discoveries that go against the established knowledge.
We recently made such a discovery about distributed diffusion model training. A common way to optimize diffusion model training is by ensuring the numerical stability of their generation paths. We found that that's not true for the most efficient distributed diffusion model training architecture.
We shared what works instead in our blogpost below.
https://t.co/aVkPne0tJd
alright folks tomorrow january 22 from 10h-12h AM EST we're going to dive into decentralized diffusion models by reviewing the paris model from bagel labs
I even managed to lock in @bidhan for an interview on why how what that's a good thing to even do
tune in!
NeurIPS takeways (better late than never)
1. real AGI needs real continual learning - models that can keep learning without catastrophic forgetting.
2. model architectures need to be "stateful" for building accurate world models for games and robotics.
3. diffusion models are superior for solving both 1 & 2.
4. @bageldotcom's distributed diffusion training architecture is SOTA among both open and closed source frontier lab comparables.
5. the age of research is back, and no better place to do frontier diffusion model research than Bagel Labs. join us - https://t.co/zweESweTOF
cooking some stuff in distributed diffusion, theres quite a ot of unturned stones left local and network inference when the architechture is made for it.
(also we're training a video MoE)