Training LLMs across multiple datacenters is hard. ๐
Synchronization demands often cause massive slowdowns as we scale up. If you're at @NeurIPSConf, come see how we tackle this!
Our work, "Scaling Laws for DiLoCo," shows how DiLoCo relax synchronization without compromising model quality, allowing training to scale incredibly well.
Come chat with me and @NovaFallen8:
๐๏ธ Thu, Dec 4 โฐ 11 AM โ 2 PM PST ๐ Exhibit Hall C,D,E, #811
#NeurIPS2025 #LLMs #DistributedTraining #ScalingLaws
We just put out a key step for making distributed training work at larger and larger models: Scaling Laws for DiLoCo
TL;DR: We can do LLM training across datacenters in a way that scales incredibly well to larger and larger models!
Training LLMs across multiple datacenters is hard. ๐
Synchronization demands often cause massive slowdowns as we scale up. If you're at @NeurIPSConf, come see how we tackle this!
Our work, "Scaling Laws for DiLoCo," shows how DiLoCo relax synchronization without compromising model quality, allowing training to scale incredibly well.
Come chat with me and @NovaFallen8:
๐๏ธ Thu, Dec 4 โฐ 11 AM โ 2 PM PST ๐ Exhibit Hall C,D,E, #811
#NeurIPS2025 #LLMs #DistributedTraining #ScalingLaws
Heading to @NeurIPSConf in San Diego.
Iโve got some DiLoCo stickers to give away! ๐พ โค๏ธ
Come check out our poster.
๐๏ธ Thu, Dec 4
โฐ 11 AM โ 2 PM PST
๐ Exhibit Hall C,D,E, #811
#NeurIPS2025
Attending @NeurIPSConf and interested in distributed, modular, and/or open AI?
Hadn't seen someone put together a list of poster presentations in this area so took it upon myself to thread out who I'm excited to talk to next week๐งต
Want to learn how to train models across the world, with 400x less bits exchanged and a huge latency tolerance? ๐
Iโll be presenting our work on how to efficiently scale distributed training at @COLM_conf.
๐๏ธ TODAY: Tuesday, 11:00 - 13:00
๐ Room 710
#COLM2025