The Lord of the Rings does not take place on an imaginary planet — it's Earth.
Middle-earth is our forgotten past, before recorded history, when Eden (Valinor) was a real place.
The truth of Tolkien's world will blow your mind... 🧵
New paper! “In-Context Learning of Representations”
What happens to an LLM’s internal representations in the large context limit?
We find that LLMs form “in-context representations” to match the structure of the task given in context!
1/n
The fall of Rome is widely misunderstood.
It wasn't invasion, disease or famine that truly brought it to its knees.
Rome collapsed because the birth rate did… (thread) 🧵
LOTR: THE FELLOWSHIP OF THE RING was released 23 years ago this week. An adaptation of Tolkien’s classic novel, and the first entry in Peter Jackson’s The Lord of the Rings trilogy, the story of how it was made is proof that one does not simply walk into Mordor…
1/76
Review "metastability demystified" is finally out @SpringerNature@NatRevNeurosci (https://t.co/wiBuG3ut1I), led by @FranHancock1 & @_fernando_rosas w/ contributions from me & colleagues of many distinct perspectives. We identify the converging mechanisms & common misconceptions.
The apparent "philosophical problems" of quantum mechanics are not unique to QM at all: they are in fact the same problems that arise whenever one attempts to construct an abstract model of reality. We can see these problems already in high school-level mechanics. (1/14)
Past societies produced so much beauty because they knew that math and beauty are deeply connected.
It all started when Pythagoras discovered something mind-blowing about reality:
The universe is not made of matter — but music... (thread) 🧵
(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: https://t.co/5dnLpcMQzu
Doubling o1-preview performance on ARC-AGI with one simple trick 🚀
tldr: by providing human-like representations to o1, we are able to substantially increase performance on @arcprize.
📢Thrilled to introduce the #VirtualLab: a team of AI scientist agents (AI chemist, AI reviewer...). Virtual Lab is led by an AI professor w/ feedback from human scientist.
The Lab created new nanobodies that we experimentally validated to bind to recent #covid variants🚀🧵
I compiled all the emails released as part of the Musk v. Altman lawsuit in chronological order (link in reply).
IMO a really valuable read. Extremely consequential decisions made in these emails.
If you are interested in Generative #AI, or statistical physics, you will know that you can use latent diffusion models to make synthetic images (or videos), but these methods are a bit slow (I explain here how the Fokker-Planck formulation and Langevin diffusion are related):
https://t.co/x4lsJZDdhq
About an hour ago some Russian scientists from Skoltech posted a quite considerable leap forward in this field, computing these maps much faster + in one step:
https://t.co/92AluHBU03
🚨 So, why do we need weight decay in modern deep learning? 🚨
The camera-ready version of our NeurIPS 2024 paper is now on arXiv (a major update compared to the first version).
Weight decay is traditionally viewed as a regularization method, but its effect in the overtraining regime is quite subtle and its interaction with the implicit regularization effect of SGD plays a crucial role.
In the undertraining regime (e.g., in LLM pretraining), however, the effect of weight decay is totally different: it sets an implicit learning rate schedule for AdamW and enables stable training with bfloat16 precision. This explains why weight decay is still widely used for LLM training with standard optimizers, such as AdamW.
This is joint work with @dngfra, @adityavardhanv, @tml_lab.
🌟 Cai Lab @Nature paper alert! In new work led by @mysteriousjoe_, we find that rest periods after learning not only stabilize new memories BUT ALSO integrate new memories with older ones from days past! (1/9)
Read it here: https://t.co/Ur8dbGfuP3
I finally got to meet @fchollet in person recently to interview him about @arcprize, intelligence vs memorization, human cognitive development, learning abstractions, limits of pattern recognition and consciousness development. These are the best bits. Full show released tomorrow
There are many things we don’t understand about deep learning. Our new NeurIPS paper (w/ @AliciaCurth) makes the mistake of trying to tackle too many of them 😅
A simplified model of deep learning describes double descent, grokking, gradient boosting & linear mode connectivity🧵