I am a big fan of Jianlin Su's blog because it always starts from first principles in mathematics, rather than "ML tricks", to approach a typical ML problem (eg. training-free MoE load balancing).
Here is me trying to "reinvent" one such blog which provides an elegant alternative to compute Muon, by filling in all the derivations that the blog skips for a less math-savvy audience (besides being entirely in Mandarin).
The goal of the blog is to find a way to compute a essential component of Muon, ie. the left and right singular value matrices U and V for the gradient G, **individually**. In the standard form, Muon really just needs their product UV^T, hence the standard way to compute it via computing a low-rank polynomial of G many times ("Newton-Schulz"). But there are more variants of Muon to control the properties of model updates if we can get both individually, hence the blog's proposal to revisit some fundamental linear algebra techniques for the computation.
The methodological takeaway from the blog's thought process is that there are three components to breaking down a ML problem: (1) how to be able to compute something (power iteration), (2) how to compute it fast (cholesky decomposition), and (3) how to compute it accurately given finite floating points (repeated orthogonalization). The goal of reading inspiring blogs like this is, in Feynman's term, to be able to "reinvent" them at any time to grasp the fundamental approach of doing similar work.
Original blog: https://t.co/5ksKPICpMW
I’m excited to announce my memoir, Out of the Shadows, will be published by HarperCollins in North America on October 13, 2026. In the book, I break my silence to reveal everything I legally can about my investigations of UAP and non-human intelligent life on behalf of the U.S Government and the profound impact my work had on me and my family. We are at a turning point in human history and I am proud to play a role in opening the public’s eyes to the truth and bringing about long overdue disclosure.
I can’t sleep at night because my mind races with all the cool shit I could be building. AI has turned my workdays into 24 hour grind sessions. I code until I literally collapse from exhaustion 7 days a week.
The next discussion of Wittgenstein's Tractatus Logico-Philosophicus will take place this evening at 6:00 PM EDT. @mishapathy and I will be discussing 4.12 to 4.53. Hope to see some of you there!
https://t.co/9zoHwB0I5s
As long as you live relatively healthy and in proximity to San Francisco, the odds are much higher than expected that you are in fact *not* born too early to explore the stars.
If we're lucky, you were actually born at the exact right moment. And I'm feeling pretty damn lucky
Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission.
Pending completion of SEC review, this gives us the option to pursue an initial public offering.
Read more: https://t.co/onGZAhRLvD
NVIDIA just released a quantized Qwen3.6 MoE model on Hugging Face
35B total, 3B active parameters
NVFP4 shrinks memory ~3x with near-zero accuracy loss
As promised 🙏
This is what $billions in AI infra actually looks like on the floor not in a keynote, not in a brochure.
NVIDIA DGX B300 racks. Compute as far as the eye can see. Most people never get within 500 metres of this stuff.
This is what the future actually looks. 🔥