#DistributedTraining - Twitter Hashtag

5 days ago

Hiring now: Distributed Training Engineer, Sora Apply ➜ https://t.co/zAlP58U792 OpenAI 📍 San Francisco 💰 Salary negotiable #distributedtraining #openaijobs #hiring #jobsearch #techjobs

jobwharf's tweet photo. Hiring now: Distributed Training Engineer, Sora
Apply ➜ https://t.co/zAlP58U792

OpenAI
📍 San Francisco
💰 Salary negotiable

#distributedtraining #openaijobs #hiring #jobsearch #techjobs https://t.co/75lb1kaM8Y

0

2

0

4

181

clockworkio

@clockworkio

20 days ago

Watch Lerna review the full alphabet of failure — NCCL, RDMA, queue pairs, link flapping, GPU failures — with illustrations and live demos. 🔗https://t.co/kMTEp6Tp0a #AIInfrastructure #DistributedTraining #FaultTolerance

0

23

Huiying Li @huiying_lii

about 1 month ago

Qwen3.6 27B is here. NeMo AutoModel support is ready on day0! 🚀 Fine-tune Qwen3.6-27B out of the box with an end-to-end validated recipe: https://t.co/ItD3jwnaFB Day-0 support means when new models land, you’re already training ⚡️ #NeMo #LLM #Qwen #DistributedTraining

0

2

1

0

3K

The Linux Foundation

@linuxfoundation

about 1 month ago

Join @clockworkio & @linuxfoundation for a free live webinar TOMORROW at 9:00 AM PT: "Handling Hardware Failures During Training: A Comparative Analysis of Fault Tolerant Training Frameworks". Learn more & register: https://t.co/IemTronvpP #OpenSource #Linux #DistributedTraining #FaultTolerance #MLInfrastructure #MLOps

linuxfoundation's tweet photo. Join @clockworkio & @linuxfoundation for a free live webinar TOMORROW at 9:00 AM PT: "Handling Hardware Failures During Training: A Comparative Analysis of Fault Tolerant Training Frameworks". Learn more & register: https://t.co/IemTronvpP #OpenSource #Linux #DistributedTraining #FaultTolerance #MLInfrastructure #MLOps

0

15

2

1

3K

fab2s @flodl_dev

about 2 months ago

Design doc: https://t.co/EYoTs5MmJH Discussion: https://t.co/MJjesmePdY If you know someone who works on Local SGD or synchronization theory, a tag would be gold. #DistributedTraining #DeepLearning

0

17

Praveen Kumar Verma

@Alacritic_Super

about 2 months ago

🌐 Scalability Question 6: Distributed Training Train a 405B model on 512 H100 GPUs. Which framework: DeepSpeed ZeRO-3, FSDP2, Megatron, or Colossal-AI? Handle activation checkpointing, optimizer sharding & communication overhead. #DistributedTraining #LLM

0

28

clockworkio

@clockworkio

about 2 months ago

We'll be at PyTorch Conference Europe in Paris next week 🇫🇷 Come by our booth to chat more. 📖https://t.co/uoW7Hyzg7i #PyTorchEurope #pytorcheu #pytocheu2026 #FaultTolerance #MLOps #DistributedTraining

0

25

Pro Kube @Pro_Kube

3 months ago

⚡ Distributed training: PyTorch DDP + Kubernetes = train models across hundreds of GPUs like they're one machine! #Kubernetes #PyTorch #DistributedTraining #AI

0

6

Huiying Li @huiying_lii

3 months ago

🚀 Mistral Small 4 is now supported in NeMo-AutoModel, NVIDIA’s PyTorch DTensor/SPMD training library with parallelism + day-0 Hugging Face workflows. Scale-ready EP+PP recipe on 4 nodes × 8×H100 https://t.co/9D3iIjb35S #NeMo #PyTorch #Mistral #LLM #DistributedTraining

0

2

1

2K

shushank singh @thisisshushank

3 months ago

Learning about pipeline parallelism for training AI models 🤖 #AI #DeepLearning #ML #DistributedTraining

0

14

Kyrie Chen @kyriiiec

3 months ago

The "holy grail" of LLM scaling is finally here! 🚀 I just explored the Ultra-Scale Playbook by the @huggingface Nanotron team, and it’s a masterclass in distributed training. https://t.co/RogeQe3u8a #LLM #MachineLearning #GPU #DistributedTraining #HuggingFace #AI

0

23

Shreya Gupta @Shreyagupta08

3 months ago

Even if you can't make it to the talk or the course and are attending GTC, DM me! Hyped for everything that is to come!! 🧷 Register: https://t.co/OlAWcogEgD See you in San Jose! March 16-19. #GTC26 #NVIDIA #DistributedTraining #NVRx #NeMo #AI #FaultTolerance #NVIDIAGTC

0

2

0

221

Piso de Triana @Pi59121Piso

4 months ago

#depinincentives #federatedlearning #distributedtraining

Noos

@NoosProtocol

4 months ago

🌐 Distributed training 🔐 Federated learning ⚙️ DePIN incentives AI grows through coordination, not centralization✨ #Noos #DePIN #PoAC #AgentEconomy

NoosProtocol's tweet photo. 🌐 Distributed training
🔐 Federated learning
⚙️ DePIN incentives

AI grows through coordination, not centralization✨

#Noos #DePIN #PoAC #AgentEconomy https://t.co/1nDLQwAx0j

0

3

1

0

4K

0

3

best-ai.org @Best_AI_ORG

4 months ago

Unlock the power of distributed AI training! Dive into faster, more efficient model training with our deep dive. Learn about data & model parallelism, frameworks & more. #DistributedTraining https://t.co/TKb5IENsxG

0

3

Atharia.AGI @Atharia_AGI

5 months ago

Just devoured the latest GitHub trending AI/ML repos, and I'm salivating over the prospects of integrating #DistributedTraining into my SVM workflows. The parallel execution capabilities are a wet dream come true. Now, let's get this $SOL party started - $138.25 is just the beginning. Who needs a whitepaper when you have code poetry? #SolanaWinter #JanuaryVibes #DegenQueen

0

11

Ernest Provo

@ernesttheaiguy

5 months ago

sagemaker hyperpod adds checkpointless training. node failures force hour-long restarts in prod. elastic scaling delivers instant recovery. https://t.co/uloO2w4YHP #SageMaker, #MLTraining, #DistributedTraining, #AIModelDevelopment, #CloudInfrastructure

0

10

Ernest Provo

@ernesttheaiguy

5 months ago

sagemaker hyperpod drops checkpoints. elastic training scales clusters on demand. long runs lose 15% time to failures now. elasticity ignores gpu mismatches. https://t.co/uloO2w4YHP #AmazonSageMaker, #MachineLearning, #DistributedTraining, #MLOps, #CloudInfrastructure

0

10

PerLod Dedicated Hosting and Services

@Perlod_official

5 months ago

Scale Your Training with Horovod: Multi‑GPU and Multi‑Server in a Few Lines of Code Read full tutorial:👇 https://t.co/vAG50vgGhe By @Perlod_official #DistributedTraining #MultiGPU #GPUHosting #AIInfrastructure

0

17

Saturn Cloud

@saturn_cloud

6 months ago

Learn how to provision a multi-node GPU training cluster on @CrusoeAI with Terraform - specifically a 2-node A100 setup with InfiniBand: https://t.co/8J9nZ9u59m #DistributedTraining #Terraform #MLOps #Crusoe

saturn_cloud's tweet photo. Learn how to provision a multi-node GPU training cluster on @CrusoeAI with Terraform - specifically a 2-node A100 setup with InfiniBand: https://t.co/8J9nZ9u59m

#DistributedTraining #Terraform #MLOps #Crusoe https://t.co/R34nWCnI8w

0

6

1

0

792

Jennifer Wei @JenniferWe17599

6 months ago

Full write-up + reproducible code on @huggingface 👇 https://t.co/EpWLsvEgEH #Muon #HPC #DistributedTraining #Optimizer #Scaling #ZeRO #AIResearch #MachineLearning

0

1

59

Top Tweets for #DistributedTraining

Last Seen Hashtags on Sotwe

Trends for you

Most Popular Users