Step #4 ✅: Reframing the history of deep learning with the 7 constraints of deep learning progress (the most interesting part)
The most interesting part of my deep-dive came from noticing a clear trend across all the key advancements, which has completely reframed how I understand deep learning -
There are 7 simple constraints that limit the capacity of digital intelligence:
(1) data
(2) parameters
(3) optimization & regularization
(4) architecture
(5) compute
(6) compute efficiency
(7) energy
The entire history of deep learning can be seen as the series of advancements that have gradually raised the ceiling on these constraints, enabling the creation of increasingly intelligent systems.
By framing each of the key breakthroughs in-terms of these constraints, it becomes clear exactly how the field has evolved over time, and it also becomes more clear where we're headed in the future.
Specifically, the goal of deep learning is to produce accurate models of reality by:
(1) Treating the true models that describe reality as complex probability distributions
(2) Creating neural networks capable of modeling complex probability distributions
(3) Training these networks to learn to model the probability distributions that underlie reality
In this view, the intelligence of a neural network is determined by how well it models the true distributions of reality.
This is fundamentally bottlenecked by each of the constraints:
(1) Data - The cap on how well a model can approximate the true distributions of reality is determined by how much information about the true distribution is contained within the dataset. This is why increasing data quality & data quantity have consistently pushed forward deep learning
(2) Parameters - A models ability to accurately approximate the distribution of the dataset is bounded by it's representational capacity. The representational capacity of a model is bounded by the number of parameters it contains.
(3) Optimization & Regularization - The number of parameters (especially depth) a model can have while still effectively converging is constrained by the efficacy of optimization & regularization approaches.
(4) Architecture - The representational capacity of a model with a given number of parameters is constrained by it's architecture.
(5) Compute - The total available compute constraints the maximum number of trainable parameters a model can have.
(6) Compute Efficiency - The software implementations of the model for training constrain the efficiency of compute utilization.
(7) Energy - The energy available to draw from the grid in a single location constrains the amount of compute that can be used for a training run.
You can then reframe the entire history of deep learning in terms of these 7 constraints (which I have done in the repo linked in this thread), and it reveals many interesting trends.
I've found that thinking in terms of these 7 constraints is also particularly helpful for thinking reasonably about where AI will head in the future.
Currently, we're in the "Scaling Laws" paradigm where the current constraints appear to be compute & parameters.
However, this is not just because scaling compute & parameters is always fundamentally the best approach to increasing intelligence.
Instead, it's a result of the fact that the current data distribution we're trying to model (internet scale datasets) has far more information available to model than current neural networks are capable of learning - in other words, neural networks still don't have enough representation capacity to store this data.
However, it's inevitable that at some point, models will grow large enough to effectively learn the information the datasets have to offer - the question is how far off will that point be, and how good will models be.
In fact, energy & data may in fact become the dominating constraints again at some point, shifting the paradigm of focus away from scaling laws (temporarily).
The fundamental law that explains all progression toward increasing intelligence is the continual increase of the ceiling of these 7 constraints.
In the repository linked in the next post, I wrote my complete reframing of the history of deep learning in terms of the 7 constraints, and what it can tell us about:
(1) How is progress made in deep learning?
(2) Where do the ideas that drive progress in deep learning come from?
(3) How have our narratives about digital intelligence changed over time?
(4) What does deep learning teach us about our own intelligence?
(5) Where is the future of deep learning headed?