@ASFleischman Smartest guy went straight through IMO medals, Harvard Math, MIT PhD, and then seemed to disappear of the face of the earth (or at least the internet as far as I can tell)
My most amusing interaction was where the model (I think I was given some earlier version with a stale system prompt) refused to believe me that it is 2025 and kept inventing reasons why I must be trying to trick it or playing some elaborate joke on it. I kept giving it images and articles from "the future" and it kept insisting it was all fake. It accused me of using generative AI to defeat its challenges and argued why real wikipedia entries were actually generated and what the "dead giveaways" are. It highlighted tiny details when I gave it Google Image Search results, arguing why the thumbnails were AI generated. I then realized later that I forgot to turn on the "Google Search" tool. Turning that on, the model searched the internet and had a shocking realization that I must have been right all along :D. It's in these unintended moments where you are clearly off the hiking trails and somewhere in the generalization jungle that you can best get a sense of model smell.
@nikitabier https://t.co/Jkf04AQQvW
Just over your line but it's still more walkable to Cal Ave or Town and Country than a lot of the stuff in your box.
We think of them in terms of search engines and databases.
We are making this mistake because we have a similar interface, i.e. the "ask the computer a question and get back an answer" interface, but they are machines to help us think, not machines to help us recall.
People are misled because sci-fi depictions of AI often take the form of an oracular entity, e.g. "Computer, tell us the coordinates of Planet X-Y-12 and the stellar mass of its star."
Many people are already aware of this and warn us about "hallucinations" and to check answers from AI, but there's an underlying assumption that as we make better LLMs, the problem will go away. I think it will get better, but the problem won't disappear.
AI is not better at giving you facts, it's a thinking partner. It's good at brainstorming, stirring creativity, and vetting ideas that you have.
Smart thinking humans are good at thinking, and less so at recall - someone who is good at thinking MAY have a great command of many facts, but that's not the core of their strength - it's the thinking part: being good at ideation, evaluation, creativity.
Normal humans look at smart humans and think, "Oh, he's so smart, he knows many things." No, he sounds authoritative so you assume all the things he says are true. Smart humans are just as likely to possess false information, they are just better at reasoning about things - evaluating proposed logical structures. This does means they might be able to discard some false info if the "facts don't add up" - this is an ability that depends on cognitive strength - but they might very well still have memorized (and believe) lots of other false information if there wasn't any way to check verify it.
AI contains within its training data numerous "facts," some of which may be false, and it doesn't really know if they are false unless there's some way to verify them.
AI are not databases or search engines, into which we can insert already-verified facts and then recall them. This is what we're currently mistaking AI for, because we use a similar interface, i.e. the "ask the computer a question and get back an answer" interface.
But they are machines that aid us in cognition, not machines to help us recall.
i'm only a day in so far but @openai's deep research and o3 is exceeding the value of the $150K i am paying a private research team to research craniopharyngioma treatments for my daughter.
$200/mo is an insane ROI.
grateful to @sama and the @OpenAI team.
One lesson to take from this from a user experience standpoint: a model that is 90% as good as o1 but with no usage limits is an OOM more useful in a knowledge worker’s daily life than an o1-tier model with ~10 queries a day.
In the 3 days of testing R1, I have probably sent 10x as many queries as I have to o1 in the past three months combined — despite being a ChatGPT Plus subscriber.
A great model that I can work and iterate with — without concern of getting cut off after a handful of messages — is so much better than a slightly better model that I can chat with for 5 minutes a day.
There's a shocking fact about AI that nobody tells you: You can catch up to the public AI research frontier in just 2 weeks. Yes, really.
I've built a $150M annual revenue startup over the last 8 years and If I were to start a company today, I’d drop everything and go all-in on AI.
But like many busy software builders, I felt lost—overwhelmed by the noisy, crowded and fast-moving modern AI landscape. And I wasn’t alone.
So I spent my entire holiday diving deep into AI research—reading 30+ papers, watching hours of lectures, analyzing trends, and catching up to the research frontier.
✨ Here’s what I learned:
- You don’t need months (or years) to catch up.
- You don’t need a PhD or decades of ML experience.
- You need fewer than 20 papers and 2 weeks to understand the major breakthroughs shaping AI today.
It's because the technology is extremely nascent and most techniques that came before are no longer relevant:
- ChatGPT is barely 2 years old and Transformers are only 7 years old.
- Most game-changing discoveries happened within the last 4 years, driven by a few breakthrough ideas, scaling laws, and efficient matrix multiplication.
The biggest secret?
Many groundbreaking AI papers with thousands of citations are surprisingly simple and applied, like adding "let's think step by step" to the prompt, or simply asking the LLM over and over again to improve its answer (Self-Refine).
I realized there are tons of founders and builders in the same boat—wanting to dive deeper into AI but unsure where to start.
I've created an essential AI Guide that helped me catch up, in just 2 weeks, to the frontier of public AI research to figure out where the next opportunities and gaps were:
- Curated list of only the most important papers
- Simple explanations of key concepts
- Clear pathway to understanding the frontier of modern AI
It’s perfect for:
- Founders expanding into AI
- Builders wanting to innovate at the frontier of AI
- Investors looking to separate the signal from the noise
👇 Want the full guide?
- Like and Share this post
- Comment "AI Guide"
- I'll send you the complete guide
(ps, I’m also teaming up with @VishalVasishth, co-founder of @obviousvc with @ev (focused on large-scale societal impact companies like Twitter, Medium, Beyond Meat), to host a small meetup to discuss what's working and needs to be solved in the AI stack in SF. Message me if you're interested)
@WilliamAEden I went in to the movie thinking it was going to be the whole book and only after about 2 hours did I realize it was going to just end in the middle!
At least the success of the first movie funded him to finish it properly this year.
@point97dollars@WilliamAEden It's dark tonally but visually it's a riot of color. This trailer does a good overview although it's more action heavy than the series as a whole. It's a short watch of only about 3-4 hours in total.
https://t.co/irJvE7N2vI
@AlexCaswen@gfodor Do you know that for a fact? If true, it does seem like a more generalizable approach to hook up an LLM to a system that can properly maintain state and do logic and math vs rely on an emergent property of the LLM. Meta tried this for playing Diplomacy and it seemed to work.
@gfodor It may turn out that you actually can replicate a lot of human behavior and functional capabilities by just reverse engineering our language. That makes sense theoretically in a loose sort of way, but it doesn't seem possible until you actually see it!
@gfodor Amazing, this seems definitely outside the expected capabilities of an LLM. I guess there's enough variables in the model now that it can use language prediction to successfully model symbolic math and its application to a real world problem?
@antoniogm I hate this! I took a call a few years ago from a VC intro'd mutual friend. Asked me to help him "learn" about BNPL products industry and a few weeks later he leads the funding round on a key competitor!!!
At least we crushed them anyways and blew his exit.