Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means.
Here's @_dcw02's latest work with @zhijianliu_'s DFlash technique in @sgl_project -- ~1k TPS!
https://t.co/iUJm984dq0
Want to train LLMs on longer contexts without re-engineering your entire systems stack?
Introducing AutoSP — the first compiler-based solution that automatically optimizes LLM training for long contexts. Under the hood, AutoSP applies a series of compiler passes that trigger sequence parallelism, paired with a curated activation-checkpointing scheme tailored for long-context training. It's integrated directly into DeepSpeed, so enabling long-context training is just a config change away.
No more rewiring your stack to push context lengths. Read the blog to learn more 🖇️ https://t.co/TMjWfsO8fy
✍ @AhanGupta13, Zhihao W., Neel Dani, @toh_tana, Tunji Ruwase, @_Minjia_Zhang_
#PyTorch #DeepSpeed #AutoSP #OpenSourceAI
DFlash⚡ meets OpenClaw🦞 = FlashClaw
Same Claw. >4X faster or cheaper.
DFlash support for Qwen3.5 is live — outperforming native MTP by up to 2.3X.
More to come! 🔥
The GLM models by @Zai_org have been a gamechanger for me. I was reluctant to embrace coding agents before I could run the models myself.
Now, with GLM-5, I have a top-quality self-hosted intelligence endpoint tightly integrated into my engineering work.
https://t.co/95XSG31K0P