Learnt some fun computing history from @lauriewired =)
Makes you wonder #llm https://t.co/TKvKq1hHeM
And also https://t.co/sLeZKp3vrH has a fun website.
🚀 open sourced metalBLAS, hand-tuned Metal matmul kernels for Apple Silicon, callable from PyTorch on mps.
Matches/beats MPS Graph (torch) matmuls on bf16/fp16, 2-3x faster on fp32 (TF32-relaxed) across the bench suite on M5 Pro.
Next step is to upstream this to PyTorch!
https://t.co/EMGdZaagXP
Apparently this is a wild enough idea that people are saying it sounds ridiculous but we’re not questioning pseudo #opensource translucent “open weights” #llm though they’re also binaries that generates code then the code generates the results.
What if, #llm coding agents don't need to write human readable code, no #Python, no #Java, no C++, no cobol, just binaries. Binaries that we don't understand but the CLI does the job we need. The ultimate source, not closed source, #opensource but pure binaries.
Okay, now lets think carefully (as if I'm prompting an #llm LOL), wouldn't any arguments saying that such a binary cannot be governed or audited or checked or reliable, doesn't the same applies to any "open weights" models?
#foodforthought for the weekend.
What if, #llm coding agents don't need to write human readable code, no #Python, no #Java, no C++, no cobol, just binaries. Binaries that we don't understand but the CLI does the job we need. The ultimate source, not closed source, #opensource but pure binaries.
With all these agents / #llm models in the wild. The true value will be created when human connection is placed first.
The barista that makes the coffee with a robot but talks to you while the robot is making it or fetching it from the preorder rack.
Startup labs doing #llm should start doing mergers and acquisitions. What’s coming is going to wipe out a lot of high burn rate companies…
Think how much more powerful it’ll be if talent, compute and data is consolidated across these labs 😏
Many roughly know how a transformer works
To REALLY understand modern neural LMs—MoEs, GPU tiling, kernels, RLHF, data—you need CS336
By @tatsu_hashimoto, @percyliang
The 2026 edition appears on yt with ~2 weeks delay
https://t.co/iEWTqEivvB
Materials
https://t.co/E1pzUSC6Tr
@MIT_CSAIL They can only spend money, not make them. What should happen is that agents should make money by themselves and make them self-sustaining.
If it’s really so useful and clever we shouldn’t need to pay subscription for it, it should go find money to pay for itself 😉
The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it.
One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math.
We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens.
Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements.
Paper: https://t.co/rqIY9SYBDe
Blog: https://t.co/oRjNbpJKha
Code: https://t.co/FAFaJwpxAJ
⚡️