Demis Hassabis’s “Einstein test” for defining AGI:
Train a model on all human knowledge but cut it off at 1911, then see if it can independently discover general relativity (as Einstein did by 1915);
if yes, it’s AGI.
If you're an "ML Engineer" and you don't understand how the Jacobian of your network evolves during training, you’re missing the mechanism that controls feature learning, signal flow, and training stability.
Concept 14: The Jacobian Flow of Deep Networks
For a model fθ, the Jacobian with respect to inputs is:
J(x) = ∂fθ(x) / ∂x
Its singular values determine how information is amplified or compressed.
Training moves this spectrum along a predictable trajectory called Jacobian Flow.
1. At Initialization, Spectrum Controls Trainability
Pointer, signal propagation
For a deep network to be trainable, the singular value distribution of J(x) must be balanced.
If σ_max is too large, signals explode.
If σ_min is too small, signals vanish.
For linear networks,
J = Πₗ Wₗ,
so log singular values add, making initialization crucial.
2. Early Training, Singular Value Expansion
Pointer, representation shaping
As features form, the Jacobian spectrum expands:
• large singular values grow
• small singular values shrink
• anisotropy increases
The model stretches informative directions in the data and compresses noise.
This matches Phase 1 in Concept 8.
3. Mid Training, Alignment with Data Geometry
Pointer, anisotropy matching structure
The Jacobian aligns with principal components of the data.
If Σ_x has eigenvectors uᵢ, training pushes J(x) so that:
dominant singular vectors of J(x) align with uᵢ.
This amplifies discriminative directions and contracts irrelevant ones.
4. Late Training, Collapse Toward Low Rank
Pointer, converged representation
Near convergence:
• intermediate Jacobians change little
• singular values stabilize
• effective rank decreases
• Jacobian becomes a spiked spectrum
This connects to Neural Collapse, where intra class features contract and class means form a simplex.
5. Overparameterization Enhances Stability
Pointer, wide models and depth scaling
In wide networks:
• J(x) concentrates around its expectation
• singular value spread narrows
• gradient norms remain stable across depth
This is part of why larger models train more reliably and avoid exploding or vanishing gradients.
NTK behavior emerges when Jacobians stop evolving significantly.
6. Jacobian Flow Predicts Generalization
Pointer, curvature and conditioning
Sharp minima correlate with Jacobians having large σ_max and high condition number.
Flat minima correspond to lower operator norms and better conditioning.
Generalization improves when:
‖J(x)‖₂ is controlled
and
cond(J) = σ_max / σ_min is low.
7. Practical Implications for ML Engineers
Pointer, actionable insights
• Residual connections stabilize Jacobian flow
• LayerNorm constrains singular values
• Dropout reduces anisotropy
• Weight decay lowers operator norms
• Gradient clipping prevents spectrum spikes
• Warmup smooths early spectral expansion
• Large models maintain healthier Jacobians automatically
Monitoring Jacobian norms can warn of instability before divergence.
TL;DR
Deep networks train by reshaping the Jacobian spectrum.
Early training expands it to create features, mid training aligns it with data structure, and late training collapses it toward a stable low rank form.
This Jacobian flow is a key reason deep learning yields expressive, stable, and generalizable solutions.
3D Gaussian Splat is becoming incredibly good
Pretty soon you’ll be able to recreate your own environments in video games just by snapping a few photos
We built a web app that lets you fly a spaceship through a 3D constellation of music - powered by our Lyria RealTime model. 🎶
Space DJ is an interactive visualization where every star represents a different music genre. As you explore, your path is translated into prompts for the API, creating a continuously evolving soundtrack. ↓
A beautiful paper from MIT+Harvard+ @GoogleDeepMind 👏
Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it.
The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication.
One used a special training method called implicit chain-of-thought (ICoT), where the model first sees every intermediate reasoning step, and then those steps are slowly removed as training continues.
This forces the model to “think” internally rather than rely on the visible steps.
That model learned the task perfectly — it produced the right answer for every example (100% accuracy).
The other model was trained the normal way, called standard fine-tuning, where it only saw the input numbers and the final answer, not the reasoning steps.
That model almost completely failed — it only got about 1% of the answers correct.
i.e. model trained with implicit chain of thought, called ICoT, gets 100% on 4x4 multiplication while normal training could not learn it at all
🔍 Ever notice how attention layers only tweak the residual stream in a low-dimensional way? That low-rank writing is exactly why so many SAE features stay dead—until Active Subspace Init rescues them. 👇
#AI#ML#MechInterp
What if you could not only watch a generated video, but explore it too? 🌐
Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt.
From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵
Another Arxiv book 📖 😃
"The principles of Deep Learning Theory"
Very good Textbook of 471 pages, Ideal for students and researchers interested in AI. Freely available on arxiv.
Link in comments 😎 👇
@TheVariational where can I buy the variational book?? I signed to the mail list and just got a small copy. I'm doing my master thesis about diffusion models and would be of great help for me
Finally! Google has just released Gemini CLI an AI agent that brings Gemini directly into your terminal
→ 1,000 free requests PER DAY
→ Open source
You can use it as a coding agent, automate tasks, use MCPs, generate videos & images, etc.
Steps to install and use it:
6. His Personal Regret
At 77, Hinton's biggest regret isn't professional.
"I wish I'd spent more time with my wife and with my children when they were little."
Both his wives died of cancer.
He was "obsessed with work" and now warns others about time's true value.
Introducing MedGemma, our most capable open model for multimodal medical text and image comprehension. 🩻
MedGemma is available now as part of Health AI Developer Foundations → https://t.co/GfJkvBHjTF