Hello Everyone, I wrote shell script to enable/ disable CPU cores to save battery while running laptop on battery power.
Repository https://t.co/Igw48aDuOp
I tested this in Debian 10.
Hope it helps
#bash#Linux#opensource
"Transformers" by Daniel Jurafsky and James H. Martin is one of the clearest and most mathematically grounded introductions to the Transformer architecture I have ever read.
Chapter 8 introduces the Transformer as the standard architecture behind modern large language models. What makes this chapter particularly interesting is its step-by-step presentation of the underlying mechanisms: contextual embeddings, self-attention, query, key and value vectors, scaled dot-product attention, multi-head attention, residual streams, feedforward layers, layer normalization, masking, and the parallel matrix formulation of attention.
In particular, the treatment of attention as a weighted sum of contextual representations is especially valuable. The chapter first develops an intuitive, simplified view of attention and then gradually derives the full formulation using the Q, K, and V matrices. This approach makes it easier to understand what is actually happening inside the architecture from an algebraic and matrix-based perspective, rather than simply viewing the usual block diagrams.
I think it is an excellent resource for anyone interested in understanding how Transformers work from linguistic, mathematical, and computational perspectives.
https://t.co/3fitdPy6Fv
Anthropic pays $750,000+ a year for engineers who can build LLM architectures from scratch. Stanford taught the entire thing in 1 hour lecture & released it for free.
Bookmark & watch this today before someone takes it down and read this article below
Day 4 in Observability Zero to Hero
We look at How Observability Reduces MTTR(Mean Time to Resolution)
I explain in this one
1. Production Incident Investigation Explained
2. Observability Maturity Models
Researchers made KMeans 200x faster.
And the new technique also beats approaches like cuML and FAISS.
Flash-KMeans is an IO-aware implementation of exact KMeans that redesigns the algorithm around modern GPU bottlenecks.
By attacking the memory bottlenecks directly, Flash-KMeans achieves:
- 33x speedup over cuML
- 200x speedup over FAISS
This speedup comes from how it moves through GPU memory.
Standard KMeans runs in two steps, and both are bottlenecked by reads and writes to GPU memory:
1) The first step matches every point to its nearest centroid.
Standard KMeans computes the full point-to-centroid distance matrix, writes it out to GPU memory, then reads it back to find each nearest centroid. That write-then-read round trip is the bottleneck.
Flash-KMeans combines the distance calculation with the nearest-centroid step, so the result is computed on-chip and the full matrix is never written out.
2) The second step recomputes each centroid by averaging the points assigned to it.
Standard KMeans has thousands of threads writing into the same centroid slots at once, so they stall waiting for their turn.
Flash-KMeans sorts points by cluster first, turning scattered writes into sequential reductions that read and write memory in one efficient pass.
Using these two optimizations at the million-scale, Flash-KMeans completes a standard KMeans iteration in a few milliseconds.
The video below depicts this in action.
Several reasons why this is important:
KMeans has always been an offline primitive. Something you run once to preprocess data and move on.
These speedups make the approach viable in several runtime-critical systems.
↳ Vector indices like FAISS use KMeans to build search indices. Faster KMeans means you can re-index dynamically as data changes.
↳ LLM quantization methods need KMeans to find optimal weight codebooks, per layer, repeatedly. What takes hours could now take minutes.
↳ MoE models need fast token routing at inference time. Flash-KMeans makes it viable to run this inside the inference loop, not just in preprocessing.
I have shared the paper in the replies.
That said, memory is the real constraint Flash-KMeans solves, and the problem is not just limited to clustering. The vectors a RAG system stores after indexing create similar bottlenecks.
I wrote a detailed walkthrough recently on cutting this vector memory by 32x with binary quantization, querying 36M+ vectors in a few milliseconds.
Read it below.
Dennis Ritchie invented C in 1972, co-built Unix in 1969, and his code is running inside every device you are reading this on right now and the colleague who announced his death had to do it through a Google+ post because no journalist thought to check.
He worked at Bell Labs in New Jersey for 44 years. He never gave a keynote. He never ran a company. He never appeared on a magazine cover. He just wrote code that became the invisible foundation everything else is built on.
Here is what he actually built, and why it matters more than almost anything that happened in tech.
In 1969, Bell Labs had just walked away from one of the most ambitious computing projects in history. The Multics project, a joint effort between MIT, Bell Labs, and General Electric, had collapsed under its own weight. Too complex. Too expensive. Too slow. Bell Labs pulled out.
Ken Thompson and Dennis Ritchie refused to let the ideas die.
Working in a small office in Murray Hill, New Jersey, Thompson wrote the first version of Unix in three weeks during the summer of 1969. One week for the file system. One week for the process management. One week for the command shell. Ritchie was working alongside him, and when the system needed a language that could express what they were building, he built one.
In 1972 he completed C.
C was not just another programming language. It was a different philosophy about what a programming language should be. Before C, most systems code was written in assembly, which meant every program was tied to the specific hardware it ran on. You could not move code between machines. You rewrote it from scratch every time.
C changed that. It sat close enough to the hardware to be fast, but abstract enough to run on anything. When Thompson rewrote the Unix kernel in C in 1973, it became the first operating system that could be picked up and moved to a completely different machine without starting over. Portability was a new idea. Ritchie made it real.
The branching that followed is almost impossible to overstate.
Unix spread from Bell Labs to universities. At Berkeley, it became BSD. BSD became the foundation of macOS and iOS. Unix influenced Linus Torvalds, who built Linux in 1991. Linux now runs every Android phone, every major web server, every supercomputer on the Top500 list, and the overwhelming majority of cloud infrastructure at AWS, Google, and Microsoft.
C became the parent language of C++, Java, JavaScript, Python, and Objective-C. Rob Pike, who worked across the hall from Ritchie at Bell Labs for 20 years, said it plainly: "The browsers are written in C. The Unix kernel that the entire internet runs on is written in C. Web servers are written in C, and if they're not, they're written in Java or C++, which are C derivatives, or Python or Ruby, which are implemented in C."
Ritchie won the Turing Award in 1983. He won the National Medal of Technology in 1998, presented by President Clinton. He was head of System Software Research at Bell Labs for decades.
He answered emails from strangers with technical questions until the end of his life. His home address stayed listed in the phone book. His colleague Brian Kernighan, who co-authored the definitive C textbook with him, said Ritchie was a private person who did no self-salesmanship. That was not false modesty. It was just who he was.
He died on October 12, 2011, at his home in Berkeley Heights, New Jersey. He was 70. He had been ill for some time. The world did not notice until Rob Pike posted a quiet announcement on Google+, and the news spread through the programming community in hushed tones.
No front pages. No tributes from heads of state. No candlelight vigils outside corporate campuses.
The device you are reading this on runs code that traces directly back to what he built. So does the server that delivered it to you. So does the browser or app you opened to get here.
Most people will never know his name.
The ones who built everything you use every day do.
Dive Into Systems - free online book diving into systems engineering. The chapter on code optimization talks about various compiler flags and the respective work done.
https://t.co/0nuYiNiC1O
A Google engineer named Lee Boonstra wrote down everything she knew about prompting in one 68-page document, and Google gave it away for free instead of selling it.
Link is in the comments. Download it
For deeper Machine Learning Foundations study, this YouTube playlist gives you the sequence in one place.
Good save when you want the path, not a one-off video: Ep #1 - What is ML? → Machine Learning Foundations.
𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀:
↳ Machine Learning Foundations: Ep #1 - What is ML?
↳ Computer vision by building a neural network with TensorFlow | Machine Learning Foundations
↳ Machine Learning Foundations: Ep #3 - Convolutions and pooling
↳ Machine Learning Foundations: Ep #4 - Coding with Convolutional Neural Networks
↳ Real-world image classification using convolutional neural networks | Machine Learning Foundations
Best use: treat it as a map of the field. Watch once for the arc, then revisit the parts where you need implementation depth.
Link is in the first comment 👇
♻️ Share this with your network if you found it useful or insightful.
Step-By-Step LLM Engineering Projects Roadmap
- Build a tokenizer
- Learn embeddings
- Implement RoPE / ALiBi
- Hand-wire attention
- Build MHA
- Build a Transformer block
- Train a mini-former
- Compare objectives
- Build sampling
- Speculative decoding
- KV cache
- MQA / GQA / MLA
- Long context
- FlashAttention
- Hardware budgets
- Toy MoE
- Sparse model trade-offs
- State-space / linear attention
- Diffusion language models
- Data pipelines
- Synthetic data
- Scaling laws
- SFT / DPO / RLHF / GRPO
- Quantization
- Serving stacks
- Eval harnesses
- RAG
- Tool use / agents
- Vision-language adapters
- Interpretability
- Red-team suite
- Full capstone model system
One request:
Choose an Opensource AI lab when you make it
Opensource is where humanity gets to keep the tools
DM me when you've made it ;)
I'm offering "Functional Programming with OCaml" on the NPTEL platform in July 2026 sem. Enrollment is open now.
The first 8 modules of the interactive book should be fairly stable. The rest is still in development. Sharing early in the spirit of building in the open.