Karpathy threw a grenade at every senior engineer who still treats LLMs as a toy.
his actual words: the worst thing an expert can do right now is reject them.
most experts read it as a threat, but it's advice.
his framing:
> the gap between "AI tools are bad" and "AI tools are useful when used right" is professional discipline, not capability
> agents have cognitive deficits. they fail in ways nothing in the training set anticipated
> the experts who reject LLMs lose to experts who learn to wrangle them
> "models have so many cognitive deficits. but you can route around them"
routing around the deficits is what CLAUDE.md was invented for.
Karpathy himself wrote 4 rules. across 30 codebases they took my Claude error rate from 41% down to 11%. solid drop.
but his rules pre-date the slop era going public. I bolted on 8 more, tuned to the failure modes that surfaced after January. got it down to 3%.
a CLAUDE.md does not raise Claude's IQ. it lowers his slop floor. that is the entire game.
open the article underneath.
the model is not the bottleneck. your config is.
Dario is wrong.
He knows absolutely nothing about the effects of technological revolutions on the labor market.
Don't listen to him, Sam, Yoshua, Geoff, or me on this topic.
Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn , @DAcemogluMIT , @amcafee , @davidautor
I was chatting with my buddy at Google, who's been a tech director there for about 20 years, about their AI adoption. Craziest convo I've had all year.
The TL;DR is that Google engineering appears to have the same AI adoption footprint as John Deere, the tractor company. Most of the industry has the same internal adoption curve: 20% agentic power users, 20% outright refusers, 60% still using Cursor or equivalent chat tool. It turns out Google has this curve too.
But why is Google so... average? How is it that a handful of companies are taking off like a spaceship, and the rest, including Google, are mired in inaction?
My buddy's observation was key here: There has been an industry-wide hiring freeze for 18+ months, during which time nobody has been moving jobs. So there are no clued-in people coming in from the outside to tell Google how far behind they are, how utterly mediocre they have become as an eng org.
He says the problem is that they can't use Claude Code because it's the enemy, and Gemini has never been good enough to capture people's workflows like Claude has, so basically agentic coding just never really took off inside Google. They're all just plodding along, completely oblivious to what's happening out there right now.
Not only is Google not able to do anything about it, they don't seem to be aware of the problem at all. I'm having major flashbacks to fifty years ago as a kid at the La Brea Tar Pits, asking, "why can't they just climb out?"
My Google friend and I had this conversation over a month ago. I didn't share it because I wanted to look around a bit, and see if it's really as bad as all that. I've been talking to people from dozens of companies since then. And yeah. It's as bad as all that.
Google is about average. Some companies at the bottom have near-zero AI adoption and can't even get budget for AI. They may have moats and high walls, but the horde is coming for them all the same.
And then there are a few companies I've met recently who are *amazingly* leaned in to AI adoption. One category-leader company just cancelled IntelliJ for a thousand engineers. That's an incredibly bold move, one of many they're making towards agentic adoption. In my opinion, that company is setting themselves up for a _huge_ W.
As for the rest, well, it's the Great Siloing. Everyone's flying blind. With nobody moving companies, no company knows where they stand on the AI adoption curve. Nobody knows how they're doing compared to everyone else.
Half of them just check a box: "We enabled {Copilot/Cursor} for everyone!" Cue smug celebrations. They think this is like getting SOC2 compliance, just a thing they turn on and now it's "solved." And they don't realize that they've done effectively nothing at all.
All because of a hiring freeze.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
Maybe the Delve founders are genuine psychopaths. Maybe they set out to defraud people from day one.
But they're 21. Both of them. Basically kids.
More likely: they got into YC, got surrounded by people telling them to move fast and break things, grow at all costs, fake it till you make it, and so on. They wanted to impress their batchmates, their partners, the alumni network, their parents. So they pushed too hard and broke things they shouldn't have broken.
YC selects for this. They want young naive founders who are aggressive, ambitious, a little reckless. They celebrate the ones who bend the rules and win. When a 21 year old bends the rules and loses, suddenly it's a "trust" issue and they get a cold three-sentence farewell on Bookface.
No mentorship. No "hey, you're heading in a dangerous direction." No community stepping in before it got this far. Just "we asked them to leave, we wish them well."
That's not a community. That's a machine that takes credit for your wins and disowns you when you mess up.
Expectation: the age of the IDE is over
Reality: we’re going to need a bigger IDE
(imo).
It just looks very different because humans now move upwards and program at a higher level - the basic unit of interest is not one file but one agent. It’s still programming.
We have absolutely no idea what makes good neural network optimizers, whole field is a meme:
>We throw away gradient updates randomly
>Outperforms Muon with RMSProp
Be Alex Krizhevsky.
Born in the Soviet Union.
Join Hinton’s lab.
Create AlexNet.
Train it on GPUs in your bedroom.
Breaks every record.
Spark the Deep Learning revolution.
Get 181,495 citations.
Disappear.
Guys I hate to rain on this parade but json prompting isn’t better. This post doesn’t even try to provide evidence that it’s better, it’s just hype.
It physically pains me that this is getting so much traction
- I’ve actually done experiments on this and markdown or xml is better
- “Models are trained on json” -> yes they’re also trained on a massive amount of plain text, markdown, etc
- JSON isn’t token efficient and creates tons of noise/attention load with whitespace, escaping, and keeping track of closing characters
- JSON puts the model in a “I’m reading/outputting code” part of the distribution, not always what you want
There was a period of time where you could make $200-400k by just knowing how to code front-ends
That period lasted from 2015-2024 and is now over, since Claude 4 etc is good enough for any product designer who codes or back-end engineer to take over 99% of the work, leaving the front-end engineer jobless unless they adapt–by taking someone else’s job(s)
xAI launched Grok 4 without any documentation of their safety testing. This is reckless and breaks with industry best practices followed by other major AI labs.
If xAI is going to be a frontier AI developer, they should act like one. 🧵
They aren't even hiding it anymore.
NASDAQ futures start *aggressively* selling off at 7:52 PM (green circle) for no reason whatsoever.
Trump announces tariffs on Canada, Europe, and the rest of the world at 8:06 PM (blue circle).
NASDAQ bottoms *4 minutes* later at 8:10 PM (red circle).
You don't even need room temperature IQ to figure out what is happening over here.