Thanks to an amazing partnership with @inferact and @lmsysorg /@radixark , Dynamo had day0 support for DeepSeek-V4 with features including large scale P/D disaggregation on B/GB200 and 300 and KV cache aware routing.
Containers and some fun PRs linked in the next thread!
We are open sourcing bytecheckpoint and veomni!
bytecheckpoint is the Bytedance's production checkpointing system for foundation model training, battle-tested with jobs with 10k+ GPUs. Blazing fast save/load, load-time checkpoint auto-resharding for different parallelism across training stages (pretrain/SFT/RL).
veomni is a open source model training framework for llm and multi-modal training. UI-TARS (the SOTA GUI Agent model prior to OpenAI operator's release) is trained with veomni. Developed with modular design, integrated with sequence/expert/zero-optimizer parallelism, offloading optimizations, @liger_kernel. Trainer-free (let user control the training loop) and easy for researcher to hack! The go-to framework for text/multimodal llm pre-training and post-training, from research to production.
Try them today, your feedback is welcome!
Code:
- https://t.co/IKM7ceK7GN
- https://t.co/HAzL6Omc6j
Paper:
- NSDI paper: https://t.co/HjHgbjv9r0
🚀 Day 3 of #OpenSourceWeek: DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
✅ Supports dense layout and two MoE layouts
🔗 GitHub: https://t.co/cxJ55w61pT
🚀 Day 2 of #OpenSourceWeek: DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅ High-throughput kernels for training and inference prefilling
✅ Low-latency kernels for inference decoding
✅ Native FP8 dispatch support
✅ Flexible GPU resource control for computation-communication overlapping
🔗 GitHub: https://t.co/Q6eRxZ9kgW
@questflow@jonfitzsimon@jappleby Bigger reason is there are laws in China that requires foreign companies to follow, you mentioned both F and G, but you don't mention Microsoft (Bing) and AWS as well as Apple cloud services, those are all available in China because they abide the Chinese laws, F&G chose not to.
Microsoft Research has released BioGPT, a large language model trained on biomedical research literature. The model achieves better-than-human performance on answering questions from the biomedical literature, as evaluated on PubMedQA. The code for the model has been publicly released, and the weights for the large 1.5B model are expected to be released soon as well.
Link to the paper: https://t.co/92MBKqWPZd
Link to the GitHub repo: https://t.co/baaFEsmuNF
#ArtificialIntelligence #GenerateiveAI #LLM #LLMs #AI #NLP #DeepLearning @MicrosoftResearch
@ClearHeatVision@MattMolinario He skipped the game last night to host a party to celebrate his "goat"ness. Having goats at the party doesn't make him a goat, but his ego is preventing him from seeing it lawlz.
@ClearHeatVision Nobody give a f about that poser who claims goat by himself lawlz. His ego is like a black hole and it's devouring everybody around him with it