Final version of my book (with a new title)
Online Learning: A Modern Introduction Using
Convex Optimization
Especially proud of the Foreword by @NicoloCB!
It'll be printed by Cambridge University Press.
The end of 7 years of updates :)
https://t.co/NeqTSih2ra
🏆 Neo Scholar applications are open!
Are you a college student who excels at CS?
Follow in the footsteps of Neo Scholars who founded Cursor, Chai Discovery, Applied Compute, Flint, Cognition, & more.
Apply to join one of tech’s strongest communities. https://t.co/gzG3lS8Bbw
In 2014 Dutch scientists left a hamster wheel outside, to see if wild animals would use it like domesticated counterparts.
The answer: hell yes! 734 visits from wild mice, plus rats, shrews, slugs (!) & even frogs and snails.
The apparent reason: fun. Just fun.
I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.
Garry and I pushed in this video to stop AB 500 and AB 1217. As of today, both bills are now dead.
The people are awake, and the tides are turning. Our elected leaders destroying K-12 education will no longer be tolerated.
The man in the photo appears to be JD Vance. His smile seems forced—it's a "social" grin without engaging the eyes (no crow's feet), suggesting insincerity or emotional detachment. The eyes do look somewhat vacant, with a distant, calculating stare that lacks warmth, potentially indicating guardedness or dissociation. This could stem from high-stress public life, but it's subjective; true psychoanalysis requires more context.
Day in Trump's economy:
> Wake up, check the news
> BLS: "10 billion new jobs created"
> Check my phone to see what today's tariff rates are.
> Norway up 20%, Cambodia down 5%
> Go to my iphone assembly line job where I make $7.25/hour
> Spend next 4 hours putting chips inside phones
> Take 15 minute break
> Check my shitfartpisscoin holdings
> Rugged
> Watch the FOMC meeting
> FOMC is just Trump
> Trump goes on stage and announces he's raising rates from -10% to -5%
> Also announces date of Jay Powell's public execution
> AI manager scolds me for taking 16 minutes on my 15 minute break
> It's only been 12 minutes
> Call employee help line to complain
> It's also AI
> Go home frustrated
> Complain to my girlfriend about my job
> She's also AI
The way agents use the internet is broken.
They don't have access to your accounts, they're blocked on half of all websites, & they take hours to set up.
Today, we're introducing @CompositeAI - the agent that connects to your browser to automate your mundane tasks.
After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook"
Check it out here: https://t.co/dekxY4BQZO
A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels, how and why overlap compute & communication – all scaling bottlenecks and tools introduced with motivation, theory, interactive plots from our 4000+ scaling experiments and even NotebookLM podcasters to tag along with you.
- How was DeepSeek trained for $5M only?
- Why did Mistral trained an MoE?
- Why is PyTorch native Data Parallelism implementation so complex under the hood?
- What are all the parallelism techniques and why were they invented?
- Should I use ZeRO-3 or Pipeline Parallelism when scaling and what's the story behind both techniques?
- What is this Context Parallelism that Meta used to train Llama 3? Is it different from Sequence Parallelism?
- What is FP8? how does it compares to BF16?
In this book, our goal was to gather, in a single place, a coherent, easy to read yet detailed story of all the techniques that make today's LLM scaling possible.
The largest factor for democratizing AI will always be teaching everyone how to build AI and in particular how to create, train and fine-tune high performance models. In other word making accessible to everybody the techniques that power all recent large language models and efficient training is possibly one of the most essential of them.
What started as a simple blog-post ended up becoming an interactive writing piece containing 30k+ words. So we've decided to actually print it as a real 100-pages physical book as well: the physical ultrafast playbook –containing all the science of distributed and fast AI training.
We plan to send free copies as gifts to the first readers of the online version so feel free to add your email in the form linked in the blog post.