In awe of SpaceX and its story - past, present and the future. You can think about it in 10+ different ways and continue re-blowing your mind in circles. Huge congrats to the team! 🚀
23 years old with no advanced mathematics training solves Erdős problem with ChatGPT Pro. "What’s beginning to emerge is that the problem was maybe easier than expected, and it was like there was some kind of mental block.”-Terence Tao https://t.co/Cphu6dexyb
Larry Wasserman blogged for a short period of time about statistics. The blog has the highest signal-to-noise ratio of all quantitative data science blogs (Ben Recht comes close). Inactive since 2013. Has not aged.
https://t.co/oNvUnLa3cv
Can AI "learn" economic states, addressing the Lucas Critique?
With @alexolegimas we simulated data from an NK model, fit a transformer, and tested out of sample fit
It generalizes surprisingly well. We hope this stimulates discussion and future agendas
https://t.co/lXcJh9IkE9
Bayesian probability, like frequentist probability, is a model-based activity that is mathematically anchored by physical randomization at one end and calibration to a reference set at the other
https://t.co/sjOt6A8NCr
“I wasn’t the fastest guy in the world. I wouldn’t have done well in an Olympiad or a math contest. But I like to ponder. And pondering things, just sort of thinking about it and thinking about it, turns out to be a pretty good approach.’
– Jim Simons
How this 31-year-old made $250mn in 30 months: Christopher Eppinger kept trading Russian oil when sanctions meant others stopped
To read today’s feature in print pick up at a copy of @ftweekend
https://t.co/OW4TUXUJRh
Beautiful technical debugging detective longread that starts with a suspicious loss curve and ends all the way in the Objective-C++ depths of PyTorch MPS backend of addcmul_ that silently fails on non-contiguous output tensors. I wonder how long before an LLM can do all of this.
The most interesting part for me is where @karpathy describes why LLMs aren't able to learn like humans.
As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.”
A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer.
> “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.”
So what do humans do instead?
> “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.”
> “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.”
Why can’t we just add this training to LLMs today?
> “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.”
> “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.”
> “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.”
How do humans get around model collapse?
> “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.”
In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by @erikphoel.
I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization?
> “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”