andrew blowe

@deejay_hanzel

Front end Software Engineer @Zillow Interests in mental health, music production, programming, cars. Tweets are my own.

Joined December 2010

689 Following

90 Followers

318 Posts

andrew blowe @deejay_hanzel

9 days ago

@GergelyOrosz Maybe there’s some info missing. Like what percentage of time they did this. And also was it part of broader strategies like giving engineers better understanding of the work they are automating

510

andrew blowe @deejay_hanzel

13 days ago

@ctjlewis Yeah I know a few former ones who are just good at talking in a way that makes it seem like they understand, memorized leetcode, but can’t debug simple web apps

andrew blowe @deejay_hanzel

19 days ago

@theo Isn’t git butler trying to solve source control for the agentic world?

201

andrew blowe @deejay_hanzel

28 days ago

@lucasmeijer This is pretty easy to solve, just have to add some tracking files or state files

157

Who to follow

paul hor

@hpavx

React Native Developer

Irving Ventura

@VdeVentura

Engineering Manager @ Datadog 🇻🇪 ✈️ 🇩🇴 ✈️ 🇩🇪 ✈️ 🇨🇵

Richard D. Ferri

@crashdiced

deejay_hanzel retweeted

Mario Nawfal

@MarioNawfal

29 days ago

The CEO of Take-Two, the company behind GTA, just said something the entire AI industry doesn't want to hear. And he said it without being anti-AI. Strauss Zelnick's argument is precise. AI is built on datasets. Datasets are backward-looking. Creativity is forward-looking. A model trained on everything that already exists cannot, by definition, produce something genuinely unexpected. And all hits, by their very nature, are unexpected. Asset creation and hit creation are not the same thing. AI is getting very good at the first one. The second one is what actually makes money, builds franchises, and changes culture. Nobody has shown AI can do that yet. The derivative property problem is real. You can clone GTA with existing technology. You could do it before AI. It would take 3 years and look identical. It still wouldn't sell. Because it isn't GTA. It's a clone of GTA. And consumers, despite what the industry occasionally pretends, can feel the difference between something genuinely new and something assembled from the residue of things that already worked. Thousands of mobile games ship every year. 0 to 5 hits get made. The same studios make them every time. The technology to make more games has been commoditized for years. It didn't democratize hit creation. It just flooded the market with more forgettable product. The Silicon Valley thesis that AI unlocks game creation for everyone is true in the same way that cheap cameras unlocked filmmaking for everyone. They did. And the same 5 studios still make the movies everyone watches. What Zelnick is saying, without quite saying it, is that the thing AI cannot replicate is taste. The instinct for what hasn't been done yet. The cultural antenna that detects the gap in the market before the data can see it. Data tells you what people wanted. Hits tell people what they want next. Those are different jobs.

437

10K

deejay_hanzel retweeted

Lee Robinson

@leerob

about 1 month ago

Code is actually the right abstraction. Too often I see the future of software engineering diminished down to, effectively, writing and reviewing markdown files. Yes, it will be hard to review thousands of lines of agent code. But maybe the takeaway is that you want less code? Rather than just giving up ("well I guess we won't read the code, or we'll read this lossy markdown summary") this should be a signal forcing you to think about better systems. - How can we make our codebase more verifiable? For example, fast/robust/stable tests, or moving to a typed language. - How can we deslop or improve the architecture/abstractions of the code generated by agents? For example, spending more time up front on the codebase architecture/types before yolo generating all of the code. - How are we going to maintain and evolve this codebase over time? The slop compounds. One great solution here is... you guessed it, learning from the past decades of software engineering! For example, you might just have the wrong abstraction entirely, leading to a ton of duplicated code. I think the markdown folks *are* right in some ways. If you are using skills every day, for many different prompts and workflows, isn't that effectively "coding with markdown"? Kinda. There's been plenty of ink spilled on the merits and benefits of skills. To me, skills make your style of working legible for agents. They don't replace code and that's not really the point. In reality, there's this messy and constantly re-evolving future in which both of these things are true: 1. Skills (and markdown) are important for how you give input to the agents and ensure high-quality code & systems are created 2. Looking at the actual code will not be replaced by markdown summaries or a collection of spec documents that ignore the lower level details of the code In summary: reality has a surprising amount of detail (and nuance)!

110

579

114K

deejay_hanzel retweeted

Tuomas Artman

@artman

about 1 month ago

Today is a hard day. I shared this note with the @linear team today: We’ve made the difficult decision to increase our workforce. This is not a cost-cutting exercise or a reflection of anyone’s performance. We’re simply reimagining every role for the agentic AI era. We’re hiring. We’re sorry about that.

449

14K

638

987K

andrew blowe @deejay_hanzel

about 1 month ago

@marcus_lowe Aren’t most engineers with access to Claude now cracked full stack engineers?

173

andrew blowe @deejay_hanzel

about 1 month ago

Some are saying that code doesn’t matter anymore, or that it’s like caring about what’s being compiled under the hood. While that’s partially true, it ignores the fact that the underlying material of what you are building with does matter and have limitations. Code is material

deejay_hanzel retweeted

Garry Tan

@garrytan

2 months ago

This is the simplest distillation of what I have learned about agentic engineering this year Push smart fuzzy operations humans do into markdown skills. Fat skills. Push must-be-perfect deterministic operations into code. Fat code. The harness? Keep it thin.

garrytan's tweet photo. This is the simplest distillation of what I have learned about agentic engineering this year

Push smart fuzzy operations humans do into markdown skills. Fat skills.

Push must-be-perfect deterministic operations into code. Fat code.

The harness? Keep it thin. https://t.co/3ESkNepZrZ

104

211

220K

deejay_hanzel retweeted

AlexZ 🦀

@blackanger

2 months ago

Anthropic 发布了新的 Harness 工程实践文章。表面看是介绍他们的新产品，但从更宏观的角度看这篇文章，本质是在探讨一个问题：当模型能力持续变化时，Agent 系统到底该把什么做成“稳定接口”，把什么留给未来不断重写？看完文章，我觉得这篇文章隐藏了 Anthropic 对未来 Agent 基础设施的一个深度判断： Agent 基础设施会越来越像“微型操作系统（Agent OS）”，这是我觉得最值得重视的地方。 Agent 框架最忌讳的是把“暂时性的模型缺陷”上升为“永久性的系统结构”。 harness（调度循环、上下文整理、工具路由等）本质上是在编码对模型能力边界的假设，但是这些假设会随着模型变强而迅速过时。在我之前发布的《驾驭工程》（马书）里有句话：模型能力越强，harness 将会越简单。但是现在很多人写 Agent ，包括我自己，很容易把模型缺陷写到 Agent 框架里。比如，模型不会规划，就强行把步骤拆成固定 DAG 。。等等。这些其实都算是 harness 补丁，其实我们 agent 开发者也很难去判断哪些是模型缺陷。 Anthropic 的做法是设计一个 meta-harness：不去承诺具体 harness 长什么样，只承诺几类长期稳定的接口。这样的话，我们就不必猜测模型缺陷是什么。这其实和 OS 思路差不多，OS 从来不关心未来程序怎么写，它只是提供抽象接口。那么 Anthropic 这个 meta-harness 是如何给 Agent 做抽象呢？文章里最重要的抽象是三件事： - session，可以理解为事件日志 / durable state - harness，推理-调度循环 / 脑干 - sandbox，执行环境 / 手脚它们的分离，才是这个架构的核心创新点。 1. The session is not Claude’s context window 文章里这句话代表了 meta-harness 如何抽象 session。它不是简单的聊天记录，它代表的是「可恢复事件流」。 Anthropic 想做的是，把 session 做成一个 append-only event log，不能把它当作一个直接喂给模型的 prompt，它应该是一个可查询、可回放、可恢复、可重组的真实执行历史。如果你只是把 session 当作上下文窗口的镜像，那其实就表示它损失了可恢复性。 2. Harness：可替换的 orchestration layer 早期 Anthropic 把 harness、session、sandbox 都塞进一个容器里。但这样做遇到了很多问题，比如，harness 崩了，整个会话难恢复；容器挂了，状态可能丢失；调试困难，包含了用户数据无法轻易使用shell调试；VPC 接入困难。所以，他们把 harness 从容器里拿出来，变成一个“调用工具的脑干”。现在的 Harness : - 不再拥有状态 - 不再假设工具在哪里，你只能看到一个接口 execute(name, input) -> string （有种 nix 哲学了） Harness 这样抽象的意思是：AI 不需要知道它在哪台设备，哪个操作系统，手机还是电脑，容器还是虚拟机等等，它只知道“我可以使用哪些手”。（想象一下千手观音。。） 3. Sandbox：“某种具体的手” 文章里说，decouple the brain from the hands 也就是说： - Claude + harness 是 brain - sandboxes / tools / MCP / custom infra 是 hands 一旦 sandbox 只是 hand，就意味着它们彼此独立，可以来自不同的基础设施，可以共享和传递，也不需要每个 session 都加载启动完整的 sandbox。这直接引出后面的 “many brains, many hands”。这次 Anthropic 的实践来自于现代分布式系统最核心的经验之一：不要试图保住某个进程，要保住可恢复的事实记录和重启协议。这样设计，也从架构上增强了安全性。文章里有句话， narrow scoping is an obvious mitigation, but this encodes an assumption about what Claude can't do with a limited token—and Claude is getting increasingly smart. 意思是说： - 当然你可以给模型一个“范围较小的 token” - 但这其实还是在赌模型做不到某些攻击路径 - 而模型在持续变强，这种“它应该不会想到吧”的安全假设会越来越脆弱所以他们不要把 credentials 放在 sandbox 里。比如 Git token 只在初始化 clone/push/pull 过程中以受控方式接入，不让模型直接读到 token。模型通过 MCP proxy 间接调用，proxy 拿 session token 去 vault 取真实凭证再执行。不把安全建立在模型能力不足上。这样设计，也把“可恢复历史”从“上下文窗口”中解放出来了。它的理念是，不要把完整历史放在模型上下文里；把它放到 session 这个可查询对象里。从系统角度看： - Claude 的 context window 是执行现场 - session log 是证据仓库 - harness 是检索与重组器于是： - prompt 不再承担“永久记忆”职责 - trimming 不等于历史消失 - compaction 不等于事实不可恢复恢复时可以重新取原始事件，这是比“纯摘要 memory”高一个层级的设计。这样设计，也提升了性能。降低了 TTFT（time to first token），不再让每个 session 预付全部容器成本。现在，先让 brain 起跑，hand 只有在需要时才 provision。这是典型的 lazy materialization 思路。文章里给出的数据也很漂亮： - p50 TTFT 降约 60% - p95 降超过 90% 这个数字说明一个问题：原来瓶颈并非模型推理本身，而是把整个执行环境预耦合到了请求入口。最后，别忘记这样设计的初衷，就是为了增强未来 Agent 的可扩展性。文章最后说：“Many brains, many hands” 这意味着，Anthropic 未来是要做一个 agent runtime substrate。多脑协作/ 多手编排 / 跨环境执行。 Agent 的本体不应绑定某个执行壳，而应绑定一组可恢复状态与可调用能力。文章里也提到了这样设计的一些缺点，比如让 brain 管 many hands，本身是更难的 cognitive task。所以这个架构的前提是，模型智能已足够高，能承担更抽象的工具路由责任。这其实就是在赌未来模型能力一定会提升，面向未来设计。最后，这篇文章给我的启发有三条： - 启发 1：会话不是消息列表，而是“执行事实流”。 - 启发 2：工具环境不要内化为 agent 自身。 - 启发 3：上下文工程应该是 harness 可替换策略，要考虑未来一旦模型变强、或者检索策略变了不产生技术债。这篇文章没明说，但我认为它真正代表的是 Anthropic 的一个产品哲学：我们不相信今天的 agent harness 会是最终形态，所以我们优先投资于稳定接口，而不是一次性最优实现。 Agent 系统的未来，在于一组长期稳定的系统抽象之上。

869

181

159K

deejay_hanzel retweeted

banteg

@banteg

2 months ago

anthropic running the exact same marketing playbook with every release. “our model is so capable and dangerous, ahh we are afraid to release it”. just put the model in the bag lil bro.

165

13K

727

473

304K

andrew blowe @deejay_hanzel

2 months ago

@pmarca This works for normies, not with mental health unfortunately. And a lot of smart people have mental health issues that create blockers and limitations. Taking action, even if it’s small steps can help with facing those limitations, if that’s what you mean by ‘just do it’

deejay_hanzel retweeted

Wes Bos

@wesbos

3 months ago

Claude Code leaked their source map, effectively giving you a look into the codebase. I immediately went for the one thing that mattered: spinner verbs There are 187

wesbos's tweet photo. Claude Code leaked their source map, effectively giving you a look into the codebase.

I immediately went for the one thing that mattered: spinner verbs

There are 187 https://t.co/zFW3ZrVz8G

721

26K

deejay_hanzel retweeted

Andrej Karpathy

@karpathy

3 months ago

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

31K

andrew blowe @deejay_hanzel

3 months ago

Omg

Boris Cherny

@bcherny

3 months ago

no 👏 more 👏 permission prompts 👏

334

250

470K

andrew blowe @deejay_hanzel

3 months ago

Stick to tech, man 🤫

Marc Andreessen 🇺🇸

@pmarca

3 months ago

My big conclusion from this week: Introspection causes emotional disorders.

11K

634

52M

andrew blowe @deejay_hanzel

3 months ago

I think people are overreacting to cursor using the Kimi model for composer, but appreciate the transparency

Lee Robinson

@leerob

3 months ago

I'm a big believer in open source, especially as AI improves. It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model 🙏 Their team clarified our usage was licensed in the tweet below. https://t.co/h8uwGKAQeN

199

107

155

398K

deejay_hanzel retweeted

Anish Moonka

@anishmoonka

3 months ago

Andrej Karpathy just went ~66 mins on No Priors Podcast with Sarah Guo about code agents, AutoResearch, and what happens when humans become the bottleneck in their own systems. The clearest thinking I have heard on what just changed in December 2025 and why everything feels different now. My notes: 𝟭. 𝗧𝗵𝗲 𝗗𝗲𝗰𝗲𝗺𝗯𝗲𝗿 𝟮𝟬𝟮𝟱 𝗳𝗹𝗶𝗽 𝘄𝗮𝘀 𝗿𝗲𝗮𝗹. Karpathy went from writing 80% of his own code to writing almost none. He has not typed a line of code since December. The shift happened over a few weeks, and he says most people outside software engineering have no idea it even happened. People can now build entire apps with Vibe coding, even with no prior coding experience. That is just the start. What Karpathy is describing is a whole different level of delegation. 𝟮. 𝗧𝗵𝗲 𝘂𝗻𝗶𝘁 𝗼𝗳 𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘄 𝗮 𝘄𝗵𝗼𝗹𝗲 𝗳𝗲𝗮𝘁𝘂𝗿𝗲, 𝗻𝗼𝘁 𝗮 𝗹𝗶𝗻𝗲 𝗼𝗳 𝗰𝗼𝗱𝗲. He runs multiple Codex agents on a tiled monitor. Each one takes about 20 minutes. You assign a feature to agent one, another to agent two, and review their outputs as they come back. The human is now a project manager, routing macro-level tasks across a team of agents. The parallel to investing is obvious: the best portfolio managers stopped picking individual stocks years ago. They pick strategies. The same thing is happening to engineering. 𝟯. 𝗜𝗳 𝘆𝗼𝘂 𝗵𝗮𝘃𝗲 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻 𝗰𝗮𝗽𝗮𝗰𝗶𝘁𝘆 𝗹𝗲𝗳𝘁, 𝘆𝗼𝘂 𝘄𝗮𝘀𝘁𝗲𝗱 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁. Karpathy compares it to his PhD days when idle GPUs made him nervous. Now the scarce resource is tokens, and the bottleneck is your own ability to formulate the next task. You are the constraint in the system. The machines are waiting for you. This reframe matters. If everything that fails feels like a skill issue rather than a capability ceiling, then you can always get better. That is what makes it addictive. 𝟰. 𝗔𝗴𝗲𝗻𝘁 𝗽𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘁𝘆 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝗽𝗲𝗼𝗽𝗹𝗲 𝘁𝗵𝗶𝗻𝗸. He says Claude Code feels like a teammate who is excited about what you are building. Codex is functionally competent but emotionally flat. He actually finds himself trying to earn Claude's praise, which is "really weird" by his own admission. OpenClaw (an agent built by @steipete) dialed the personality and the memory system simultaneously, and got something that replaces 6 home automation apps in a single WhatsApp chat. I keep hearing this from builders. The tool that cares about your project gets used more than the one that does not. 𝟱. 𝗔𝘂𝘁𝗼𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗿𝗮𝗻 𝟳𝟬𝟬 𝗲𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝘀 𝗶𝗻 𝘁𝘄𝗼 𝗱𝗮𝘆𝘀 𝗮𝗻𝗱 𝗳𝗼𝘂𝗻𝗱 𝘁𝗵𝗶𝗻𝗴𝘀 𝗵𝗲 𝗺𝗶𝘀𝘀𝗲𝗱 𝗳𝗼𝗿 𝘁𝘄𝗼 𝗱𝗲𝗰𝗮𝗱𝗲𝘀. He gave an agent his NanoChat training setup, a metric (validation bits per byte), and permission to modify the code. The agent found 20 optimizations, including forgotten weight decay on value embeddings and under-tuned Adam betas. These things interact with each other, so once you tune one parameter, the others need to shift too. No human has the patience for that kind of exhaustive search. The Shopify CEO ran the same pattern overnight and achieved a 19% improvement in an internal model. This pattern is going to eat every domain with a measurable metric. 𝟲. 𝗘𝘃𝗲𝗿𝘆 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗼𝗿𝗴 𝗶𝘀 𝗮 𝘀𝗲𝘁 𝗼𝗳 𝗺𝗮𝗿𝗸𝗱𝗼𝘄𝗻 𝗳𝗶𝗹𝗲𝘀. Karpathy's program.md tells the agent what to try, what to leave alone, and when to stop. Different instructions produce different progress rates. Which means you can optimize the instructions themselves. Run 100 different program.md files, see which ones yield the most improvement, and use that data to write a better one. This is the recursive layer that makes people nervous. And excited. Both at the same time, probably. 𝟳. 𝗠𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝘀𝗶𝗺𝘂𝗹𝘁𝗮𝗻𝗲𝗼𝘂𝘀𝗹𝘆 𝗯𝗿𝗶𝗹𝗹𝗶𝗮𝗻𝘁 𝗣𝗵𝗗 𝘀𝘁𝘂𝗱𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝟭𝟬-𝘆𝗲𝗮𝗿-𝗼𝗹𝗱𝘀. Ask ChatGPT for a joke today and you will get the same atoms joke from four years ago. Ask it to refactor your entire codebase, and it will move mountains. Reinforcement learning (the training method that improves models by rewarding correct answers) only optimizes what it can score, leaving everything outside the scoring boundary frozen. The story that "smarter at code = smarter at everything" is not playing out in a satisfying way. Anyone who has spent time with these tools knows this feeling. Godlike at one thing, clueless at the next. 𝟴. 𝗢𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗶𝘀 ~𝟴 𝗺𝗼𝗻𝘁𝗵𝘀 𝗯𝗲𝗵𝗶𝗻𝗱 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗮𝗻𝗱 𝗰𝗹𝗼𝘀𝗶𝗻𝗴. The gap started at 18 months and has been compressing. Karpathy compares open source AI to Linux: the industry demands a common open platform, and businesses will fund it. For most consumer use cases, even today's open source models are good enough. Frontier intelligence will still matter for the hardest problems, like rewriting Linux from C to Rust, but the basic use cases are already covered. Centralization of intelligence has a bad track record in political and economic systems. A healthy ecosystem needs both a frontier and a commons. 𝟵. 𝗗𝗶𝗴𝗶𝘁𝗮𝗹 𝗱𝗶𝘀𝗿𝘂𝗽𝘁𝗶𝗼𝗻 𝘄𝗶𝗹𝗹 𝗮𝗿𝗿𝗶𝘃𝗲 𝘆𝗲𝗮𝗿𝘀 𝗯𝗲𝗳𝗼𝗿𝗲 𝗽𝗵𝘆𝘀𝗶𝗰𝗮𝗹. Bits are a million times easier to move than atoms. There is an enormous overhang of digital information that humans simply never had enough thinking cycles to process. Agents will chew through that first. Physical-world robotics is a bigger total market but will lag because atoms require capital, slow iteration, and high error tolerance. Self-driving took a decade and is still not done. The interesting companies will be at the interface: sensors that feed data into the intelligence, and actuators that carry out its decisions in the physical world. 𝟭𝟬. 𝗝𝗲𝘃𝗼𝗻𝘀' 𝗽𝗮𝗿𝗮𝗱𝗼𝘅 𝗽𝗿𝗼𝗯𝗮𝗯𝗹𝘆 𝗵𝗼𝗹𝗱𝘀 𝗳𝗼𝗿 𝘀𝗼𝗳𝘁𝘄𝗮𝗿𝗲. ATMs made bank branches cheaper. So there were more branches. So there were more tellers. Software is becoming radically cheaper to produce, and demand for it should grow accordingly. The long-term is genuinely uncertain, but locally, right now, there will be more demand for software because the barrier has just collapsed. I keep coming back to this framing whenever people ask if AI will "replace" engineers. The question misses the point. The question is whether the world wants more software than it currently has. Obviously yes. 𝟭𝟭. 𝗔𝗻 𝘂𝗻𝘁𝗿𝘂𝘀𝘁𝗲𝗱 𝘀𝘄𝗮𝗿𝗺 𝗼𝗳 𝗮𝗴𝗲𝗻𝘁𝘀 𝗰𝗼𝘂𝗹𝗱 𝗼𝘂𝘁𝗽𝗮𝗰𝗲 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗹𝗮𝗯𝘀. Karpathy is designing a SETI@home-style system for AutoResearch. Finding a good commit is hard (requires thousands of failed attempts), but verifying it is cheap (just retrain once). Frontier labs have massive trusted compute, but the earth has a much larger pool of untrusted compute. If the verification system works, the swarm could run circles around any single lab. This is the most ambitious claim in the whole conversation. And the most exciting, because it would mean anyone with a GPU can contribute to the frontier. 𝟭𝟮. 𝗧𝗲𝗮𝗰𝗵𝗲𝗿𝘀 𝘀𝗵𝗼𝘂𝗹𝗱 𝘁𝗲𝗮𝗰𝗵 𝗮𝗴𝗲𝗻𝘁𝘀, 𝗻𝗼𝘁 𝗽𝗲𝗼𝗽𝗹𝗲. Karpathy built MicroGPT, a full GPT training implementation in 200 lines of pure Python. He started making an explanatory video, then stopped. The code is already simple enough for agents to understand. If he writes a "skill" (a structured curriculum for the agent), the agent can teach each person at their level, in their language, with infinite patience. The teacher's job is now the few irreducible bits of insight that the agent cannot generate on its own. This reframes the entire profession. The best teachers will be the ones who know what agents still cannot figure out, and package just those bits. The full podcast is worth listening to. Link in Thread.

anishmoonka's tweet photo. Andrej Karpathy just went ~66 mins on No Priors Podcast with Sarah Guo about code agents, AutoResearch, and what happens when humans become the bottleneck in their own systems.

The clearest thinking I have heard on what just changed in December 2025 and why everything feels different now.

My notes:

𝟭. 𝗧𝗵𝗲 𝗗𝗲𝗰𝗲𝗺𝗯𝗲𝗿 𝟮𝟬𝟮𝟱 𝗳𝗹𝗶𝗽 𝘄𝗮𝘀 𝗿𝗲𝗮𝗹.

Karpathy went from writing 80% of his own code to writing almost none. He has not typed a line of code since December. The shift happened over a few weeks, and he says most people outside software engineering have no idea it even happened.

People can now build entire apps with Vibe coding, even with no prior coding experience. That is just the start. What Karpathy is describing is a whole different level of delegation.

𝟮. 𝗧𝗵𝗲 𝘂𝗻𝗶𝘁 𝗼𝗳 𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘄 𝗮 𝘄𝗵𝗼𝗹𝗲 𝗳𝗲𝗮𝘁𝘂𝗿𝗲, 𝗻𝗼𝘁 𝗮 𝗹𝗶𝗻𝗲 𝗼𝗳 𝗰𝗼𝗱𝗲.

He runs multiple Codex agents on a tiled monitor. Each one takes about 20 minutes. You assign a feature to agent one, another to agent two, and review their outputs as they come back. The human is now a project manager, routing macro-level tasks across a team of agents.

The parallel to investing is obvious: the best portfolio managers stopped picking individual stocks years ago. They pick strategies. The same thing is happening to engineering.

𝟯. 𝗜𝗳 𝘆𝗼𝘂 𝗵𝗮𝘃𝗲 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻 𝗰𝗮𝗽𝗮𝗰𝗶𝘁𝘆 𝗹𝗲𝗳𝘁, 𝘆𝗼𝘂 𝘄𝗮𝘀𝘁𝗲𝗱 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁.

Karpathy compares it to his PhD days when idle GPUs made him nervous. Now the scarce resource is tokens, and the bottleneck is your own ability to formulate the next task. You are the constraint in the system. The machines are waiting for you.

This reframe matters. If everything that fails feels like a skill issue rather than a capability ceiling, then you can always get better. That is what makes it addictive.

𝟰. 𝗔𝗴𝗲𝗻𝘁 𝗽𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘁𝘆 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝗽𝗲𝗼𝗽𝗹𝗲 𝘁𝗵𝗶𝗻𝗸.

He says Claude Code feels like a teammate who is excited about what you are building. Codex is functionally competent but emotionally flat. He actually finds himself trying to earn Claude's praise, which is "really weird" by his own admission. OpenClaw (an agent built by @steipete) dialed the personality and the memory system simultaneously, and got something that replaces 6 home automation apps in a single WhatsApp chat.

I keep hearing this from builders. The tool that cares about your project gets used more than the one that does not.

𝟱. 𝗔𝘂𝘁𝗼𝗥𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗿𝗮𝗻 𝟳𝟬𝟬 𝗲𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝘀 𝗶𝗻 𝘁𝘄𝗼 𝗱𝗮𝘆𝘀 𝗮𝗻𝗱 𝗳𝗼𝘂𝗻𝗱 𝘁𝗵𝗶𝗻𝗴𝘀 𝗵𝗲 𝗺𝗶𝘀𝘀𝗲𝗱 𝗳𝗼𝗿 𝘁𝘄𝗼 𝗱𝗲𝗰𝗮𝗱𝗲𝘀.

He gave an agent his NanoChat training setup, a metric (validation bits per byte), and permission to modify the code. The agent found 20 optimizations, including forgotten weight decay on value embeddings and under-tuned Adam betas. These things interact with each other, so once you tune one parameter, the others need to shift too. No human has the patience for that kind of exhaustive search.

The Shopify CEO ran the same pattern overnight and achieved a 19% improvement in an internal model. This pattern is going to eat every domain with a measurable metric.

𝟲. 𝗘𝘃𝗲𝗿𝘆 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗼𝗿𝗴 𝗶𝘀 𝗮 𝘀𝗲𝘁 𝗼𝗳 𝗺𝗮𝗿𝗸𝗱𝗼𝘄𝗻 𝗳𝗶𝗹𝗲𝘀.

Karpathy's program.md tells the agent what to try, what to leave alone, and when to stop. Different instructions produce different progress rates. Which means you can optimize the instructions themselves. Run 100 different program.md files, see which ones yield the most improvement, and use that data to write a better one.

This is the recursive layer that makes people nervous. And excited. Both at the same time, probably.

𝟳. 𝗠𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝘀𝗶𝗺𝘂𝗹𝘁𝗮𝗻𝗲𝗼𝘂𝘀𝗹𝘆 𝗯𝗿𝗶𝗹𝗹𝗶𝗮𝗻𝘁 𝗣𝗵𝗗 𝘀𝘁𝘂𝗱𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝟭𝟬-𝘆𝗲𝗮𝗿-𝗼𝗹𝗱𝘀.

Ask ChatGPT for a joke today and you will get the same atoms joke from four years ago. Ask it to refactor your entire codebase, and it will move mountains. Reinforcement learning (the training method that improves models by rewarding correct answers) only optimizes what it can score, leaving everything outside the scoring boundary frozen. The story that "smarter at code = smarter at everything" is not playing out in a satisfying way.

Anyone who has spent time with these tools knows this feeling. Godlike at one thing, clueless at the next.

𝟴. 𝗢𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗶𝘀 ~𝟴 𝗺𝗼𝗻𝘁𝗵𝘀 𝗯𝗲𝗵𝗶𝗻𝗱 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗮𝗻𝗱 𝗰𝗹𝗼𝘀𝗶𝗻𝗴.

The gap started at 18 months and has been compressing. Karpathy compares open source AI to Linux: the industry demands a common open platform, and businesses will fund it. For most consumer use cases, even today's open source models are good enough. Frontier intelligence will still matter for the hardest problems, like rewriting Linux from C to Rust, but the basic use cases are already covered.

Centralization of intelligence has a bad track record in political and economic systems. A healthy ecosystem needs both a frontier and a commons.

𝟵. 𝗗𝗶𝗴𝗶𝘁𝗮𝗹 𝗱𝗶𝘀𝗿𝘂𝗽𝘁𝗶𝗼𝗻 𝘄𝗶𝗹𝗹 𝗮𝗿𝗿𝗶𝘃𝗲 𝘆𝗲𝗮𝗿𝘀 𝗯𝗲𝗳𝗼𝗿𝗲 𝗽𝗵𝘆𝘀𝗶𝗰𝗮𝗹.

Bits are a million times easier to move than atoms. There is an enormous overhang of digital information that humans simply never had enough thinking cycles to process. Agents will chew through that first. Physical-world robotics is a bigger total market but will lag because atoms require capital, slow iteration, and high error tolerance. Self-driving took a decade and is still not done.

The interesting companies will be at the interface: sensors that feed data into the intelligence, and actuators that carry out its decisions in the physical world.

𝟭𝟬. 𝗝𝗲𝘃𝗼𝗻𝘀' 𝗽𝗮𝗿𝗮𝗱𝗼𝘅 𝗽𝗿𝗼𝗯𝗮𝗯𝗹𝘆 𝗵𝗼𝗹𝗱𝘀 𝗳𝗼𝗿 𝘀𝗼𝗳𝘁𝘄𝗮𝗿𝗲.

ATMs made bank branches cheaper. So there were more branches. So there were more tellers. Software is becoming radically cheaper to produce, and demand for it should grow accordingly. The long-term is genuinely uncertain, but locally, right now, there will be more demand for software because the barrier has just collapsed.

I keep coming back to this framing whenever people ask if AI will "replace" engineers. The question misses the point. The question is whether the world wants more software than it currently has. Obviously yes.

𝟭𝟭. 𝗔𝗻 𝘂𝗻𝘁𝗿𝘂𝘀𝘁𝗲𝗱 𝘀𝘄𝗮𝗿𝗺 𝗼𝗳 𝗮𝗴𝗲𝗻𝘁𝘀 𝗰𝗼𝘂𝗹𝗱 𝗼𝘂𝘁𝗽𝗮𝗰𝗲 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗹𝗮𝗯𝘀.

Karpathy is designing a SETI@home-style system for AutoResearch. Finding a good commit is hard (requires thousands of failed attempts), but verifying it is cheap (just retrain once). Frontier labs have massive trusted compute, but the earth has a much larger pool of untrusted compute. If the verification system works, the swarm could run circles around any single lab.

This is the most ambitious claim in the whole conversation. And the most exciting, because it would mean anyone with a GPU can contribute to the frontier.

𝟭𝟮. 𝗧𝗲𝗮𝗰𝗵𝗲𝗿𝘀 𝘀𝗵𝗼𝘂𝗹𝗱 𝘁𝗲𝗮𝗰𝗵 𝗮𝗴𝗲𝗻𝘁𝘀, 𝗻𝗼𝘁 𝗽𝗲𝗼𝗽𝗹𝗲.

Karpathy built MicroGPT, a full GPT training implementation in 200 lines of pure Python. He started making an explanatory video, then stopped. The code is already simple enough for agents to understand. If he writes a "skill" (a structured curriculum for the agent), the agent can teach each person at their level, in their language, with infinite patience. The teacher's job is now the few irreducible bits of insight that the agent cannot generate on its own.

This reframes the entire profession. The best teachers will be the ones who know what agents still cannot figure out, and package just those bits.

The full podcast is worth listening to. Link in Thread.

874

96K

deejay_hanzel retweeted

Todd Saunders

@toddsaunders

3 months ago

I know Silicon Valley startups don't want to hear this..... But the combination of someone in the trades with deep domain expertise and Claude Code will run circles around your generic software. I talked to Cory LaChance this morning, a mechanical engineer in industrial piping construction in Houston. He normally works with chemical plants and refineries, but now he also works with the terminal He reached out in a DM a few days ago and I was so fired up by his story, I asked him if we could record the conversation and share it. He built a full application that industrial contractors are using every day. It reads piping isometric drawings and automatically extracts every weld count, every material spec, every commodity code. Work that took 10 minutes per drawing now takes 60 seconds. It can do 100 drawings in five minutes, saving days of time. His co-workers are all mind blown, and when he talks to them, it's like they are speaking different languages. His fabrication shop uses it daily, and he built the entire thing in 8 weeks. During those 8 weeks he also had to learn everything about Claude Code, the terminal, VS Code, everything. My favorite quote from him was when he said, "I literally did this with zero outside help other than the AI. My favorite tools are screenshots, step by step instructions and asking Claude to explain things like I'm five." Every trades worker with deep expertise and a willingness to sit down with Claude Code for a few weekends is now a potential software founder. I can't wait to meet more people like Cory.

357

706

andrew blowe

@deejay_hanzel

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users