@GergelyOrosz Maybe there’s some info missing. Like what percentage of time they did this. And also was it part of broader strategies like giving engineers better understanding of the work they are automating
@ctjlewis Yeah I know a few former ones who are just good at talking in a way that makes it seem like they understand, memorized leetcode, but can’t debug simple web apps
The CEO of Take-Two, the company behind GTA, just said something the entire AI industry doesn't want to hear.
And he said it without being anti-AI.
Strauss Zelnick's argument is precise. AI is built on datasets. Datasets are backward-looking. Creativity is forward-looking. A model trained on everything that already exists cannot, by definition, produce something genuinely unexpected. And all hits, by their very nature, are unexpected.
Asset creation and hit creation are not the same thing. AI is getting very good at the first one. The second one is what actually makes money, builds franchises, and changes culture. Nobody has shown AI can do that yet.
The derivative property problem is real. You can clone GTA with existing technology. You could do it before AI. It would take 3 years and look identical. It still wouldn't sell. Because it isn't GTA. It's a clone of GTA.
And consumers, despite what the industry occasionally pretends, can feel the difference between something genuinely new and something assembled from the residue of things that already worked.
Thousands of mobile games ship every year. 0 to 5 hits get made. The same studios make them every time. The technology to make more games has been commoditized for years. It didn't democratize hit creation. It just flooded the market with more forgettable product.
The Silicon Valley thesis that AI unlocks game creation for everyone is true in the same way that cheap cameras unlocked filmmaking for everyone. They did. And the same 5 studios still make the movies everyone watches.
What Zelnick is saying, without quite saying it, is that the thing AI cannot replicate is taste. The instinct for what hasn't been done yet. The cultural antenna that detects the gap in the market before the data can see it.
Data tells you what people wanted. Hits tell people what they want next.
Those are different jobs.
Code is actually the right abstraction.
Too often I see the future of software engineering diminished down to, effectively, writing and reviewing markdown files.
Yes, it will be hard to review thousands of lines of agent code. But maybe the takeaway is that you want less code?
Rather than just giving up ("well I guess we won't read the code, or we'll read this lossy markdown summary") this should be a signal forcing you to think about better systems.
- How can we make our codebase more verifiable? For example, fast/robust/stable tests, or moving to a typed language.
- How can we deslop or improve the architecture/abstractions of the code generated by agents? For example, spending more time up front on the codebase architecture/types before yolo generating all of the code.
- How are we going to maintain and evolve this codebase over time? The slop compounds. One great solution here is... you guessed it, learning from the past decades of software engineering! For example, you might just have the wrong abstraction entirely, leading to a ton of duplicated code.
I think the markdown folks *are* right in some ways. If you are using skills every day, for many different prompts and workflows, isn't that effectively "coding with markdown"? Kinda.
There's been plenty of ink spilled on the merits and benefits of skills. To me, skills make your style of working legible for agents. They don't replace code and that's not really the point.
In reality, there's this messy and constantly re-evolving future in which both of these things are true:
1. Skills (and markdown) are important for how you give input to the agents and ensure high-quality code & systems are created
2. Looking at the actual code will not be replaced by markdown summaries or a collection of spec documents that ignore the lower level details of the code
In summary: reality has a surprising amount of detail (and nuance)!
Today is a hard day. I shared this note with the @linear team today: We’ve made the difficult decision to increase our workforce. This is not a cost-cutting exercise or a reflection of anyone’s performance. We’re simply reimagining every role for the agentic AI era. We’re hiring. We’re sorry about that.
Some are saying that code doesn’t matter anymore, or that it’s like caring about what’s being compiled under the hood. While that’s partially true, it ignores the fact that the underlying material of what you are building with does matter and have limitations. Code is material
This is the simplest distillation of what I have learned about agentic engineering this year
Push smart fuzzy operations humans do into markdown skills. Fat skills.
Push must-be-perfect deterministic operations into code. Fat code.
The harness? Keep it thin.
anthropic running the exact same marketing playbook with every release. “our model is so capable and dangerous, ahh we are afraid to release it”. just put the model in the bag lil bro.
@pmarca This works for normies, not with mental health unfortunately. And a lot of smart people have mental health issues that create blockers and limitations. Taking action, even if it’s small steps can help with facing those limitations, if that’s what you mean by ‘just do it’
Claude Code leaked their source map, effectively giving you a look into the codebase.
I immediately went for the one thing that mattered: spinner verbs
There are 187
- Drafted a blog post
- Used an LLM to meticulously improve the argument over 4 hours.
- Wow, feeling great, it’s so convincing!
- Fun idea let’s ask it to argue the opposite.
- LLM demolishes the entire argument and convinces me that the opposite is in fact true.
- lol
The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
I'm a big believer in open source, especially as AI improves.
It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model 🙏
Their team clarified our usage was licensed in the tweet below.
https://t.co/h8uwGKAQeN
Andrej Karpathy just went ~66 mins on No Priors Podcast with Sarah Guo about code agents, AutoResearch, and what happens when humans become the bottleneck in their own systems.
The clearest thinking I have heard on what just changed in December 2025 and why everything feels different now.
My notes:
𝟭. 𝗧𝗵𝗲 𝗗𝗲𝗰𝗲𝗺𝗯𝗲𝗿 𝟮𝟬𝟮𝟱 𝗳𝗹𝗶𝗽 𝘄𝗮𝘀 𝗿𝗲𝗮𝗹.
Karpathy went from writing 80% of his own code to writing almost none. He has not typed a line of code since December. The shift happened over a few weeks, and he says most people outside software engineering have no idea it even happened.
People can now build entire apps with Vibe coding, even with no prior coding experience. That is just the start. What Karpathy is describing is a whole different level of delegation.
𝟮. 𝗧𝗵𝗲 𝘂𝗻𝗶𝘁 𝗼𝗳 𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘄 𝗮 𝘄𝗵𝗼𝗹𝗲 𝗳𝗲𝗮𝘁𝘂𝗿𝗲, 𝗻𝗼𝘁 𝗮 𝗹𝗶𝗻𝗲 𝗼𝗳 𝗰𝗼𝗱𝗲.
He runs multiple Codex agents on a tiled monitor. Each one takes about 20 minutes. You assign a feature to agent one, another to agent two, and review their outputs as they come back. The human is now a project manager, routing macro-level tasks across a team of agents.
The parallel to investing is obvious: the best portfolio managers stopped picking individual stocks years ago. They pick strategies. The same thing is happening to engineering.
𝟯. 𝗜𝗳 𝘆𝗼𝘂 𝗵𝗮𝘃𝗲 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻 𝗰𝗮𝗽𝗮𝗰𝗶𝘁𝘆 𝗹𝗲𝗳𝘁, 𝘆𝗼𝘂 𝘄𝗮𝘀𝘁𝗲𝗱 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁.
Karpathy compares it to his PhD days when idle GPUs made him nervous. Now the scarce resource is tokens, and the bottleneck is your own ability to formulate the next task. You are the constraint in the system. The machines are waiting for you.
This reframe matters. If everything that fails feels like a skill issue rather than a capability ceiling, then you can always get better. That is what makes it addictive.
𝟰. 𝗔𝗴𝗲𝗻𝘁 𝗽𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘁𝘆 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝗽𝗲𝗼𝗽𝗹𝗲 𝘁𝗵𝗶𝗻𝗸.
He says Claude Code feels like a teammate who is excited about what you are building. Codex is functionally competent but emotionally flat. He actually finds himself trying to earn Claude's praise, which is "really weird" by his own admission. OpenClaw (an agent built by @steipete) dialed the personality and the memory system simultaneously, and got something that replaces 6 home automation apps in a single WhatsApp chat.
I keep hearing this from builders. The tool that cares about your project gets used more than the one that does not.
𝟱. 𝗔𝘂𝘁𝗼𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗿𝗮𝗻 𝟳𝟬𝟬 𝗲𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝘀 𝗶𝗻 𝘁𝘄𝗼 𝗱𝗮𝘆𝘀 𝗮𝗻𝗱 𝗳𝗼𝘂𝗻𝗱 𝘁𝗵𝗶𝗻𝗴𝘀 𝗵𝗲 𝗺𝗶𝘀𝘀𝗲𝗱 𝗳𝗼𝗿 𝘁𝘄𝗼 𝗱𝗲𝗰𝗮𝗱𝗲𝘀.
He gave an agent his NanoChat training setup, a metric (validation bits per byte), and permission to modify the code. The agent found 20 optimizations, including forgotten weight decay on value embeddings and under-tuned Adam betas. These things interact with each other, so once you tune one parameter, the others need to shift too. No human has the patience for that kind of exhaustive search.
The Shopify CEO ran the same pattern overnight and achieved a 19% improvement in an internal model. This pattern is going to eat every domain with a measurable metric.
𝟲. 𝗘𝘃𝗲𝗿𝘆 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗼𝗿𝗴 𝗶𝘀 𝗮 𝘀𝗲𝘁 𝗼𝗳 𝗺𝗮𝗿𝗸𝗱𝗼𝘄𝗻 𝗳𝗶𝗹𝗲𝘀.
Karpathy's program.md tells the agent what to try, what to leave alone, and when to stop. Different instructions produce different progress rates. Which means you can optimize the instructions themselves. Run 100 different program.md files, see which ones yield the most improvement, and use that data to write a better one.
This is the recursive layer that makes people nervous. And excited. Both at the same time, probably.
𝟳. 𝗠𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝘀𝗶𝗺𝘂𝗹𝘁𝗮𝗻𝗲𝗼𝘂𝘀𝗹𝘆 𝗯𝗿𝗶𝗹𝗹𝗶𝗮𝗻𝘁 𝗣𝗵𝗗 𝘀𝘁𝘂𝗱𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝟭𝟬-𝘆𝗲𝗮𝗿-𝗼𝗹𝗱𝘀.
Ask ChatGPT for a joke today and you will get the same atoms joke from four years ago. Ask it to refactor your entire codebase, and it will move mountains. Reinforcement learning (the training method that improves models by rewarding correct answers) only optimizes what it can score, leaving everything outside the scoring boundary frozen. The story that "smarter at code = smarter at everything" is not playing out in a satisfying way.
Anyone who has spent time with these tools knows this feeling. Godlike at one thing, clueless at the next.
𝟴. 𝗢𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗶𝘀 ~𝟴 𝗺𝗼𝗻𝘁𝗵𝘀 𝗯𝗲𝗵𝗶𝗻𝗱 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗮𝗻𝗱 𝗰𝗹𝗼𝘀𝗶𝗻𝗴.
The gap started at 18 months and has been compressing. Karpathy compares open source AI to Linux: the industry demands a common open platform, and businesses will fund it. For most consumer use cases, even today's open source models are good enough. Frontier intelligence will still matter for the hardest problems, like rewriting Linux from C to Rust, but the basic use cases are already covered.
Centralization of intelligence has a bad track record in political and economic systems. A healthy ecosystem needs both a frontier and a commons.
𝟵. 𝗗𝗶𝗴𝗶𝘁𝗮𝗹 𝗱𝗶𝘀𝗿𝘂𝗽𝘁𝗶𝗼𝗻 𝘄𝗶𝗹𝗹 𝗮𝗿𝗿𝗶𝘃𝗲 𝘆𝗲𝗮𝗿𝘀 𝗯𝗲𝗳𝗼𝗿𝗲 𝗽𝗵𝘆𝘀𝗶𝗰𝗮𝗹.
Bits are a million times easier to move than atoms. There is an enormous overhang of digital information that humans simply never had enough thinking cycles to process. Agents will chew through that first. Physical-world robotics is a bigger total market but will lag because atoms require capital, slow iteration, and high error tolerance. Self-driving took a decade and is still not done.
The interesting companies will be at the interface: sensors that feed data into the intelligence, and actuators that carry out its decisions in the physical world.
𝟭𝟬. 𝗝𝗲𝘃𝗼𝗻𝘀' 𝗽𝗮𝗿𝗮𝗱𝗼𝘅 𝗽𝗿𝗼𝗯𝗮𝗯𝗹𝘆 𝗵𝗼𝗹𝗱𝘀 𝗳𝗼𝗿 𝘀𝗼𝗳𝘁𝘄𝗮𝗿𝗲.
ATMs made bank branches cheaper. So there were more branches. So there were more tellers. Software is becoming radically cheaper to produce, and demand for it should grow accordingly. The long-term is genuinely uncertain, but locally, right now, there will be more demand for software because the barrier has just collapsed.
I keep coming back to this framing whenever people ask if AI will "replace" engineers. The question misses the point. The question is whether the world wants more software than it currently has. Obviously yes.
𝟭𝟭. 𝗔𝗻 𝘂𝗻𝘁𝗿𝘂𝘀𝘁𝗲𝗱 𝘀𝘄𝗮𝗿𝗺 𝗼𝗳 𝗮𝗴𝗲𝗻𝘁𝘀 𝗰𝗼𝘂𝗹𝗱 𝗼𝘂𝘁𝗽𝗮𝗰𝗲 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗹𝗮𝗯𝘀.
Karpathy is designing a SETI@home-style system for AutoResearch. Finding a good commit is hard (requires thousands of failed attempts), but verifying it is cheap (just retrain once). Frontier labs have massive trusted compute, but the earth has a much larger pool of untrusted compute. If the verification system works, the swarm could run circles around any single lab.
This is the most ambitious claim in the whole conversation. And the most exciting, because it would mean anyone with a GPU can contribute to the frontier.
𝟭𝟮. 𝗧𝗲𝗮𝗰𝗵𝗲𝗿𝘀 𝘀𝗵𝗼𝘂𝗹𝗱 𝘁𝗲𝗮𝗰𝗵 𝗮𝗴𝗲𝗻𝘁𝘀, 𝗻𝗼𝘁 𝗽𝗲𝗼𝗽𝗹𝗲.
Karpathy built MicroGPT, a full GPT training implementation in 200 lines of pure Python. He started making an explanatory video, then stopped. The code is already simple enough for agents to understand. If he writes a "skill" (a structured curriculum for the agent), the agent can teach each person at their level, in their language, with infinite patience. The teacher's job is now the few irreducible bits of insight that the agent cannot generate on its own.
This reframes the entire profession. The best teachers will be the ones who know what agents still cannot figure out, and package just those bits.
The full podcast is worth listening to. Link in Thread.
I know Silicon Valley startups don't want to hear this.....
But the combination of someone in the trades with deep domain expertise and Claude Code will run circles around your generic software.
I talked to Cory LaChance this morning, a mechanical engineer in industrial piping construction in Houston. He normally works with chemical plants and refineries, but now he also works with the terminal
He reached out in a DM a few days ago and I was so fired up by his story, I asked him if we could record the conversation and share it.
He built a full application that industrial contractors are using every day. It reads piping isometric drawings and automatically extracts every weld count, every material spec, every commodity code.
Work that took 10 minutes per drawing now takes 60 seconds. It can do 100 drawings in five minutes, saving days of time.
His co-workers are all mind blown, and when he talks to them, it's like they are speaking different languages.
His fabrication shop uses it daily, and he built the entire thing in 8 weeks. During those 8 weeks he also had to learn everything about Claude Code, the terminal, VS Code, everything.
My favorite quote from him was when he said, "I literally did this with zero outside help other than the AI. My favorite tools are screenshots, step by step instructions and asking Claude to explain things like I'm five."
Every trades worker with deep expertise and a willingness to sit down with Claude Code for a few weekends is now a potential software founder.
I can't wait to meet more people like Cory.