Continuing Tutorial II for Physics of Language Models.
We often trust large-scale results simply because they are large; but once noise is removed, the synthetic pretrain playground starts to push back — hard!
The second video (Part 4.1b, 90 minutes) makes this pushback concrete.
From it, I derive 20+ architectural principles, organized into 12 result blocks.
Two highlights that consistently surprise even experienced readers:
Result 2.1 (new):
"Why Canon layers actually work."
Not because of multi-token attention — that explanation only applies to the first layer.
The real mechanism is how Canon reshapes hierarchical learning across depth.
Result 11:
"Why linear models reason 4× shallower than Transformers."
This has nothing to do with memory size —
it is a structural failure shared by nearly all linear architectures.
In Result 12, I show which of these principles already emerge at academic-scale pretraining (1.3B / 100B) —
with orders-of-magnitude lower cost and far cleaner signals than many real-life large-scale runs.
The remaining principles do not disappear; they only emerge when scaling to 8B / 1T, which I will show in the third video (Part 4.2).
⏮️ Previous: Part 4.1a — methodology & playground design
▶️ This: Part 4.1b — architectural principles from the playground
🔜 Next: Part 4.2 — when the playground reshapes real-life pretraining
Cyc is spiritually similar to modern foundation models. Rather than building separate (expert) systems for each domain which were brittle and lacked commonsense, Cyc ambitiously strived to invest in a single knowledge base that could be adapted for a wide range of tasks.
📣 Today we launched an overhauled NLP course to 600 students in the online MS programs at UT Austin.
98 YouTube videos 🎥 + readings 📖 open to all!
https://t.co/y7sTe2Pb83
w/5 hours of new 🎥 on LLMs, RLHF, chain-of-thought, etc!
Meme trailer 🎬
https://t.co/Okv5LPQEyE
🧵
To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:
- Verified accounts are limited to reading 6000 posts/day
- Unverified accounts to 600 posts/day
- New unverified accounts to 300/day