my personal ml systems notes
over the last couple of months, ive been diving really deep into ml systems both on the training and inference for llms. this is a personal collection of notes covering distributed computing, parallelism, quantization, and pytorch internals mostly from my experiments.
1. distributed techniques - covers distributed training fundamentals: nccl collectives (gather, all-gather, reduce, all-reduce, scatter, reduce-scatter), mixture-of-experts, parallelism strategies (dp, ddp, zero, tensor/pipeline parallelism), and torch.distributed basics.
2. quantization - model quantization from first principles: symmetric/asymmetric quantization, llm.int8(), awq, smoothquant, gptq/obs/obq, and quip.
3. pytorch internals
4. jax scaling book - some solved exercises from the jax scaling book.
ill keep adding notes and refine the content to make it more presentable over time.
IN 1986 MIT FILMED A LECTURE THAT OPENS BY TELLING YOU COMPUTER SCIENCE IS NOT A SCIENCE AND HAS ALMOST NOTHING TO DO WITH COMPUTERS
72 minutes from Hal Abelson and Gerald Sussman, the lecture an entire generation of engineers calls the one that rewired how they think.
-> The line that lands: computer science is about computers the way astronomy is about telescopes. The tool was never the point.
The real subject was always one thing -- controlling complexity. Everything else is detail.
Forty years later it reads like a prophecy. AI writes the syntax now. What's left is exactly what they taught: taming complexity nobody can hold in their head.
The language was never the skill -> the thinking was. This is where you learn it.
Most people chase the newest framework. The ones who watched this think on a level frameworks can't touch.
Bookmark & Watch today it, this one's a legend ↓
I don't understand why so many people want US, UK, Canadian, or German citizenship.
Here are 12 websites to find remote jobs that pay in USD worldwide:
Day 3 in Observability Zero to Hero we look at SLI, SLO & SLA
• SLI → What are we measuring?
• SLO → What are we aiming for?
• SLA → What are we promising?
“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof.
today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data.
open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise.
with castform, model training is as simple as prompt engineering. @castformai
bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns.
see what you can build with castform👇
How memory, pointers, and data structures work under the hood.
Professor Jerry Cain, Programming Paradigms, Stanford (CS107) lecture on generic stacks in C.
An engineer at Anthropic just shared how they actually use Fable 5 internally.
How they work with it every day. 🤯
Two words: design loops.
→ Stop prompting one message at a time. Give Fable 5 a goal and let it run, self-correct, and repeat until done. You design the loop once. It handles the rest.
→ Never let the model grade its own work. Use a separate sub-agent as a verifier. Models are terrible at self-critique. A second agent in a fresh context catches what the first one misses.
→ Tested Fable 5 vs Opus 4.7 on an ML challenge. Fable improved the pipeline 6x more. Opus played safe with small tweaks. Fable bet big on structural changes and pushed through failures.
→ Memory is where Fable 5 destroys everything else. Across sessions Sonnet lists failures and moves on. Opus flags issues but rarely verifies. Fable completes the full cycle: fail
→ investigate → verify → distill → reuse. 73% verification vs Opus at 17%.
→ The insight from inside Anthropic: don’t steer Fable 5 by hand. Build loops. Let it self-correct and manage its own memory. That’s how you unlock it.
This is how the people who built the model are using it.
Fable 5 is free on Pro and Max until June 22. Try it before the window closes.
Every pointer is now 2-3 lines max. No fluff. Want me to lock this in?
firewalls can't stop this.
A developer just open sourced a tunnel that smuggles your entire internet through port 53 the port every router on earth is forced to leave open.
It's called MasterDnsVPN. It hides your traffic inside DNS queries, the one type of packet no network can block without breaking itself.
Every firewall on earth has to allow DNS. Schools, airports, hotels, hotel WiFi, entire countries running ISP-level censorship all of them keep port 53 open or nothing on the network resolves. This repo turns that loophole into a full encrypted tunnel.
Here's what makes it different from every other DNS tunnel that came before:
→ Custom ARQ layer gives you TCP-level reliability over UDP DNS, so nothing drops even on garbage networks
→ Sends every packet through up to 12 different resolver paths at the same time, if 11 fail the packet still arrives
→ Auto probes the maximum DNS payload your path can handle, then locks in the fastest MTU possible
→ AES-256-GCM, ChaCha20, AES-128, AES-192 all built in, pick your encryption
→ SOCKS5 proxy on 127.0.0.1:1080 point any browser or app at it and you're through
Killed: $12/mo Mullvad, $10/mo NordVPN, $15/mo Astrill, every commercial DNS tunnel charging monthly fees for the exact same idea.
Pre-built binaries for Windows, Linux AMD64, Linux ARM64, macOS ARM64. No Python install needed. Configure two DNS records, drop in the encryption key, run the executable.
Works in environments where every other VPN protocol is dead on arrival.
MIT License. 100% Opensource.
A DEVELOPER FOUND SEVEN WAYS TO TAKE DOWN A PRODUCTION DATABASE THAT ALL LOOK EXACTLY LIKE NORMAL, INNOCENT CODE AND ALMOST EVERY TEAM IS SHIPPING AT LEAST ONE OF THEM RIGHT NOW
17 minutes from Josh Berkus, one of the people who actually maintains PostgreSQL, walking through the quiet mistakes that turn a healthy database into a 3am outage.
-> The moment it lands, you realize none of these are exotic attacks. They're ordinary-looking decisions -- a query that locks a table, a connection that never closes, a setting no one ever questioned -- that work perfectly until the day they don't, and then they take everything down with them.
The scary part isn't that the database breaks. It's how normal the code looks right up until it does. A query that runs in 5ms on your laptop and 5 minutes on prod. A migration that silently locks the whole table. A connection pool that runs dry the moment real traffic shows up. Every one of them passed review.
Writing SQL that runs was never the hard part -> writing SQL that survives production is. And now that an AI agent is generating and firing queries at your real database faster than anyone can read them, every one of those seven landmines is one autocomplete away -- and the only person who can stop it is the one who already knows where they're buried.
Your database doesn't go down because someone attacked it. It goes down because something that looked completely normal finally caught up with it.
Save and Watch it today.
You'll see the next outage coming before it lands ↓
While everyone's arguing about which regex library is fastest, this guy built one from scratch using Brzozowski derivatives. Live, on stage.
Theory, C++ implementation, raw x86 assembly output. 36 minutes.
No frameworks. No dependencies. No hype.
Most devs copy-paste regex they don’t fully understand. This talk breaks down the math, conjunction, complement, normalization.
So you can go from matching patterns to understanding them.
The engineers who watched this mass stopped writing backtracking regex that explodes on edge cases.
The ones who skipped it are still debugging catastrophic backtracking in prod.
I just spent months handwriting a 200 page guide on the entirety of ML foundations and math from scratch.
The guide features:
- Neural Nets (Backprop, Adam, SGD, Batch Norm)
- ML Algorithms (SVM, Grad Boosting, K-means, PCA)
- Hardware (Tensor Cores, Systolic Arrays, CUDA)
- Transformers (Multi-Head Attn, KV Cache, LoRA)
- Vision (ViT, Convolutions, MAE, IoU, NMS, VLM)
- Agents (OpenClaw, ReAct, Memory, Orchestration)
Everything I wish I had years ago, for free.
a developer noticed something about DNS resolvers, the servers that turn website names into addresses
they hold onto their answers for a while before forgetting them. some for up to a week
so he wondered if he could make them remember a file instead
he found close to 3.9 million of these servers sitting open across the internet
then he chopped a file into 180-byte pieces and handed each piece to a different server to hold
to prove it worked, the file he stored was one of his own blog posts. it sat there, readable, spread across machines from China to Brazil that never agreed to hold it
the whole thing is open source
> https://t.co/8UMpecIxRE
there's no disk, no folder, no account. the file isn't saved anywhere
it just lives scattered across millions of machines that have no idea they're holding it, and forget it in a few days unless you keep reminding them
slower than a hard drive. in his words, still faster than a floppy disk
you don't store a file here. you keep it alive by asking for it
A DEVELOPER PROVED THE REGEX YOU'VE WRITTEN A THOUSAND TIMES IS SECRETLY A COMPILER AND THAT ALMOST NO ONE WHO USES THEM HAS ANY IDEA WHAT ACTUALLY RUNS
36 minutes from Paul Wankadia, the engineer behind a regex engine that compiles your pattern straight down to raw machine code -- walking through what really happens between the slashes.
-> The moment it clicks, regex stops being magic punctuation you paste from Stack Overflow and becomes what it actually is: a tiny machine. Your pattern gets turned into a state machine, and that machine is what runs against every character of your text.
That one idea explains everything you never understood. Why one regex returns instantly and a nearly identical one hangs your whole server. Why some patterns are safe and others are a denial-of-service waiting to happen. It was never random -- it's whether the machine underneath is built well or badly.
Writing a regex was never the skill -> reading one is. And now that an AI agent hands you dense, clever patterns you'd never write yourself, the person who can see the machine underneath is the one who catches the one that takes down production at 3am.
Everyone copies regex and prays. This is the talk that ends the praying.
Save it. The next time a pattern "Just works," you'll actually know why ↓
As an AI Engineer. Please learn
>Harness engineering, not just prompt engineering
>Context engineering, not just long prompts
>Prompt caching vs. semantic caching tradeoffs
>KV cache management, eviction, reuse, and memory pressure at scale
>Prefill vs. decode latency and why they optimize differently
>Continuous batching, paged attention, and throughput optimization
>Speculative decoding vs. quantization vs. distillation tradeoffs
>INT8, INT4, FP8, AWQ, GPTQ, and when quantization hurts quality
>Structured output failures, schema validation, repair loops, and fallback chains
>Function calling reliability, tool contracts, argument validation, and idempotency
>Agent guardrails, loop budgets, tool budgets, and termination conditions
>Model routing, graceful fallback logic, and degraded-mode UX
>RAG architecture: chunking, embeddings, hybrid search, reranking, and freshness
>Retrieval evals: recall, precision, grounding, attribution, and citation quality
>Evals: golden sets, regression tests, adversarial tests, LLM-as-judge, and human evals
>LLM observability as a first-class discipline: traces, spans, tokens, latency, errors, and drift
>Cost attribution per feature, workflow, tenant, and user journey not just per model
>Safety engineering: prompt injection defense, data leakage prevention, and permission boundaries
>Multi-tenant isolation, cache safety, and cross-user context contamination prevention
>Fine-tuning vs. in-context learning vs. RAG vs. distillation and when each is the wrong tool
>Latency, quality, cost, and reliability tradeoffs across the full inference stack
>Production failure modes: hallucinated tool calls, malformed JSON, stale retrieval, runaway agents, and silent eval regressions