DeepSpeed now supports the Muon Optimizer.
Optimized specifically for internal 2D weights within neural networks, Muon is gaining traction for its significant memory savings and strong convergence metrics during LLM training.
In our latest blog post, the DeepSpeed team shares a deep dive into their integration setup, implementation of hybrid optimizer strategies, and early benchmark results. @PKUWZP
Read the full technical breakdown here ๐ https://t.co/t7JxOqkM6S
@DavidSHolz yeah there are a at least few teams that iโm aware of!
i use it in pretty much all of my ongoing projects, and weโve been using it @argonne_lcf for some efforts around portable benchmarking and federated / resilient training
Oh, you're writing CUDA kernels? Everyone's on Triton now. Just kidding, we're all on Mojo. We're using cuTile. We're using ROCm. We have an in-house DSL compiler targeting the NVGPU MLIR dialect but wait, Tile IR just dropped so we're going to target that instead. Our PM is on TileLang. The team lead was on CuTe but now she's back to handwriting PTX. If you're not on Pallas, you're ngmi. Our intern is building on TT-Metalium for our Wormholes. Our CFO approved an order for some big chungus wafer-scale chips so now we're porting our kernels to CSL. Our CTO is working on a kernel-less graph compiler so we won't need to write kernels anymore. Our CEO thinks we're talking about the Linux kernel. We're building Claude for dogs.
Don't miss @DeepSpeedAI virtual office hours on May 26 at 12:00 PM America/New_York to ask questions of @toh_tana member of DeepSpeed TSC & get the latest recent key updates, including AutoSP (sequence parallel), AutoEP (expert parallel), and AutoTP (tensor parallel).
Today, NVIDIA is launching the next paradigm shift in GPU programming: cuTile BASIC
Write perf portable BASIC kernels and deploy them at any scale from edge inference devices like your calculator to entire GPU clusters
We're going back to BASIC
https://t.co/meF2T0jUSc
i call this dev quality of life improvements,
few others:
alias npm='pnpm'
alias claude='claude --dangerously-skip-permissions --teammate-mode tmux'
alias code='zed'
alias cursor='zed'
๐ ALCF APEX (AI Program for EXploration) is open!
Are you a domain scientist or an AI researcher interested in large-scale agentic AI for science?
APEX teams get leadership compute + close collaboration with ALCF experts + fully-funded postdoc
๐๐ฒ๐ฎ๐ฑ๐น๐ถ๐ป๐ฒ: ๐๐ฒ๐ฏ ๐ฎ๐ณ, ๐ฎ๐ฌ๐ฎ๐ฒ
@argonne_lcf