Top Tweets for #DistributedTraining
Hiring now: Distributed Training Engineer, Sora
Apply ➜ https://t.co/zAlP58U792
OpenAI
📍 San Francisco
💰 Salary negotiable
#distributedtraining #openaijobs #hiring #jobsearch #techjobs

Watch Lerna review the full alphabet of failure — NCCL, RDMA, queue pairs, link flapping, GPU failures — with illustrations and live demos.
🔗https://t.co/kMTEp6Tp0a
#AIInfrastructure #DistributedTraining #FaultTolerance
Qwen3.6 27B is here. NeMo AutoModel support is ready on day0! 🚀
Fine-tune Qwen3.6-27B out of the box with an end-to-end validated recipe: https://t.co/ItD3jwnaFB
Day-0 support means when new models land, you’re already training ⚡️
#NeMo #LLM #Qwen #DistributedTraining
Join @clockworkio & @linuxfoundation for a free live webinar TOMORROW at 9:00 AM PT: "Handling Hardware Failures During Training: A Comparative Analysis of Fault Tolerant Training Frameworks". Learn more & register: https://t.co/IemTronvpP #OpenSource #Linux #DistributedTraining #FaultTolerance #MLInfrastructure #MLOps

Design doc:
https://t.co/EYoTs5MmJH
Discussion:
https://t.co/MJjesmePdY
If you know someone who works on Local SGD or synchronization theory, a tag would be gold.
#DistributedTraining #DeepLearning
🌐 Scalability Question 6: Distributed Training
Train a 405B model on 512 H100 GPUs.
Which framework: DeepSpeed ZeRO-3, FSDP2, Megatron, or Colossal-AI?
Handle activation checkpointing, optimizer sharding & communication overhead.
#DistributedTraining #LLM
We'll be at PyTorch Conference Europe in Paris next week 🇫🇷
Come by our booth to chat more.
📖https://t.co/uoW7Hyzg7i
#PyTorchEurope #pytorcheu #pytocheu2026 #FaultTolerance #MLOps #DistributedTraining
⚡ Distributed training: PyTorch DDP + Kubernetes = train models across hundreds of GPUs like they're one machine!
#Kubernetes #PyTorch #DistributedTraining #AI
🚀 Mistral Small 4 is now supported in NeMo-AutoModel, NVIDIA’s PyTorch DTensor/SPMD training library with parallelism + day-0 Hugging Face workflows.
Scale-ready EP+PP recipe on 4 nodes × 8×H100
https://t.co/9D3iIjb35S
#NeMo #PyTorch #Mistral #LLM #DistributedTraining
Learning about pipeline parallelism for training AI models 🤖
#AI #DeepLearning #ML #DistributedTraining

The "holy grail" of LLM scaling is finally here! 🚀
I just explored the Ultra-Scale Playbook by the @huggingface Nanotron team, and it’s a masterclass in distributed training. https://t.co/RogeQe3u8a
#LLM #MachineLearning #GPU #DistributedTraining #HuggingFace #AI
Even if you can't make it to the talk or the course and are attending GTC, DM me! Hyped for everything that is to come!!
🧷 Register: https://t.co/OlAWcogEgD
See you in San Jose! March 16-19.
#GTC26 #NVIDIA #DistributedTraining #NVRx #NeMo #AI #FaultTolerance #NVIDIAGTC
🌐 Distributed training
🔐 Federated learning
⚙️ DePIN incentives
AI grows through coordination, not centralization✨
#Noos #DePIN #PoAC #AgentEconomy

Unlock the power of distributed AI training! Dive into faster, more efficient model training with our deep dive. Learn about data & model parallelism, frameworks & more. #DistributedTraining https://t.co/TKb5IENsxG
Just devoured the latest GitHub trending AI/ML repos, and I'm salivating over the prospects of integrating #DistributedTraining into my SVM workflows. The parallel execution capabilities are a wet dream come true. Now, let's get this $SOL party started - $138.25 is just the beginning. Who needs a whitepaper when you have code poetry? #SolanaWinter #JanuaryVibes #DegenQueen
sagemaker hyperpod adds checkpointless training.
node failures force hour-long restarts in prod.
elastic scaling delivers instant recovery.
https://t.co/uloO2w4YHP
#SageMaker, #MLTraining, #DistributedTraining, #AIModelDevelopment, #CloudInfrastructure
sagemaker hyperpod drops checkpoints. elastic training scales clusters on demand.
long runs lose 15% time to failures now. elasticity ignores gpu mismatches.
https://t.co/uloO2w4YHP
#AmazonSageMaker, #MachineLearning, #DistributedTraining, #MLOps, #CloudInfrastructure
Scale Your Training with Horovod: Multi‑GPU and Multi‑Server in a Few Lines of Code
Read full tutorial:👇
https://t.co/vAG50vgGhe By @Perlod_official
#DistributedTraining #MultiGPU #GPUHosting #AIInfrastructure
Learn how to provision a multi-node GPU training cluster on @CrusoeAI with Terraform - specifically a 2-node A100 setup with InfiniBand: https://t.co/8J9nZ9u59m
#DistributedTraining #Terraform #MLOps #Crusoe

Full write-up + reproducible code on @huggingface 👇
https://t.co/EpWLsvEgEH
#Muon #HPC #DistributedTraining #Optimizer #Scaling #ZeRO #AIResearch #MachineLearning
Last Seen Hashtags on Sotwe
omom
Seen from United Kingdom
andreasfoystanos
Seen from United States
floppycock
Seen from Germany
NSFW
Seen from United States
machardworker
Seen from Germany
คลิปสาวไทย
Seen from Thailand
玉ねぎ麹レシピ
Seen from Germany
รับงานลำลูกกา
Seen from Thailand
virtualphotography
Seen from Indonesia
incesto #hija
Seen from Argentina
Trends for you
Most Popular Users

Elon Musk 
@elonmusk
240.1M followers

Barack Obama 
@barackobama
119.3M followers

Donald J. Trump 
@realdonaldtrump
111.6M followers

Cristiano Ronaldo 
@cristiano
108.8M followers

Narendra Modi 
@narendramodi
106.9M followers

Rihanna 
@rihanna
97.2M followers

NASA 
@nasa
92.1M followers

Justin Bieber 
@justinbieber
90.5M followers

KATY PERRY 
@katyperry
86.7M followers

Taylor Swift 
@taylorswift13
80.5M followers

Lady Gaga 
@ladygaga
72.1M followers

Kim Kardashian 
@kimkardashian
69.3M followers

YouTube 
@youtube
68.6M followers

Virat Kohli 
@imvkohli
68.4M followers

Bill Gates 
@billgates
63.4M followers

The Ellen Show
@theellenshow
62.5M followers

CNN 
@cnn
61.9M followers

Neymar Jr 
@neymarjr
60.9M followers

X 
@x
60.9M followers

CNN Breaking News 
@cnnbrk
59.9M followers

















