上个月 Claude Code 源代码不是泄漏了吗?很多人根据源代码让 AI 提取了很多相关内容学习,其中最全面的可能是Zhang Alex 同学这本开源书(我做了份 pdf):
https://t.co/fzoWUDchgO
作者认为,Claude Code 源码最佳的“食用”方式应该是转化为一本书,供自己系统学习。显然,看书学习比直接看源码更舒服,也更容易形成完整的认知框架。
这本开源书就是作者从 Claude Code 泄漏出来的 TypeScript 源码里提取出来的内容,内容涉及 Claude Code 的每一个子系统——工具注册、Agent Loop、系统提示词、上下文压缩、提示词缓存、权限安全、技能系统。
我最近在看这本书,不过这个项目里没有提供 pdf,我做了一个,带目录和索引的,大家可以自取阅读。
今天刷到这篇文章几次,说点不一样的。与其说 AI First,不如说软件工程 First。
这篇文章看着在讲 AI,底下全是软件工程。
抛开后面讲组织和人的部分,原文前半段的重点简单总结一下:
AI 时代,人成了瓶颈。PM 花几周做需求,AI 两小时就能实现,PM 成了瓶颈。QA 测三天,AI 写代码只要两小时,QA 成了瓶颈。团队 25 个人,对手几百人,人力也是瓶颈。
怎么办?把人从链条里拿掉。AI 写代码、AI 审查代码、AI 跑测试、AI 部署上线、AI 监控线上状态,出了问题自动回滚。每天定时扫描日志,自动发现问题、分配任务、跟踪修复。整条流水线跑起来,人只需要在关键节点做判断。
至于文中提到的统一代码库,锦上添花,和 AI First 关系不大。有当然更好,没有也有很多替代方案。
整套方案听下来,逻辑自洽,效果也漂亮:一天部署好几次,功能当天上当天撤,数据说了算。
但先别急着照搬,先对照自己的情况想几件事:
第一,自动化测试。AI 改完代码,你得有办法确认它没搞崩别的功能。测试覆盖不够的话,每次 AI 提交代码你都得人工回归一遍,那速度根本快不起来。
第二,CI/CD 流程。从提交代码到部署上线,中间的测试、审查、发布、回滚,是不是全自动跑通了?这条流水线不通,AI 写得再快,代码也堆在那儿等人手动处理。
第三,A/B 测试和线上监控。新功能上线之后效果好不好,得有数据说话,效果不好得能随时关掉。没有这套机制,AI 一天产出五个功能,你都不知道哪个该留哪个该砍。
第四,任务管理。任务得拆到合适的粒度,生命周期得跟踪得住。一个大而模糊的任务丢给 AI,现在的能力还啃不动。多个 Agent 同时干活的时候,谁做哪个、哪个优先、做到什么程度,这些都得有地方管。
第五,系统架构。架构太乱或者压根没有架构的代码,AI 维护起来跟人一样头疼。上下文塞满了还是搞不清边界在哪,改一处崩三处。
这几条里如果有做不到的,就得靠人去补。补不上,AI First 就只是一句口号。
但假设你全做到了,就能 AI First 了?
还是不行。这套玩法只适合一部分场景。
什么场景适合?后端逻辑为主、界面不复杂的产品,比如 API 服务、数据处理平台、内部工具。功能好不好,跑一下数据就知道,不需要人去盯着每个像素。原文里的就是个 Agent 平台,本质上是后端驱动的产品,可以用这套打法。
再比如早期产品快速试错,功能上了不行就撤,用户预期本来就没那么高,AI 的速度优势能充分发挥。
但很多场景玩不转。
比如 UI 密集的产品。自媒体天天喊前端已死,但你让 AI 做个复杂界面试试,各种易用性问题、交互细节、视觉还原,它搞不定的。否则马斯克靠 AI 早就改了不知道改版 X 多少次了。
比如对功能质量敏感的产品。Anthropic 和 OpenAI 不知道 AI First 吗?他们敢在 Claude Code 和 Codex 上这么搞吗?让 AI 全自动迭代自家的核心产品,用户不骂死才怪。
再比如安全性要求高的场景,银行系统、在线交易平台,AI 代码出个差错,那可不是回滚能解决的。
AI First 的方向没有错,它代表的是一种意识的转变:每做一个决策的时候,想一想这件事能不能让 AI 来做,如果不能,缺什么条件,怎么把条件补上。
但这种意识要落地,靠的不仅是买几个 AI 工具的订阅,还需要把基础搭好。测试、CI/CD、监控、架构、任务管理,这些做扎实了,AI 的能力自然能释放出来。做不好,加再多 AI 也是在沙子上盖楼。
从这个角度看,AI First 的终点未必是让 AI 干所有的活,而是借着这股力量,把你一直想做但没动力做的工程改进,真正推动起来。
仰望星空是好的,但也还要脚踏实地。
A pattern I've noticed in stuck people:
They're always busy. They never stop moving. They have 47 tabs open and a notebook-sized to-do list. But if you ask them what they accomplished this week that actually matters, their mind goes blank.
Busyness isn't a badge of honor.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
You need to write more.
Without AI. Without templates. Without knowing what you're writing about. Just you, an idea, and enough time to do the difficult cognitive work necessary to reach true understanding. If you don't, your ability to think will drastically decline.
Wow, this tweet went very viral!
I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs.
So here's the idea in a gist format: https://t.co/NlAfEJjtJV
You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.
I like @karpathy's Obsidian setup as a way to mitigate contamination risks. Keep your personal vault clean and create a messy vault for your agents.
I prefer my personal Obsidian vault to be high signal:noise, and for all the content to have known origins.
Keeping a separation between your personally-created artifacts and agent-created artifacts prevents contaminating your primary vault with ideas you can't source.
If you let the two mix too much it will likely make Obsidian harder to use as a representation of *your* thoughts. Search, bases, quick switcher, backlinks, graph, etc, will no longer be scoped to your knowledge.
Only once your agent-facing workflow produces useful artifacts would I bring those into the primary vault.
More and more people are using Obsidian as a local wiki to read things your agents are researching and writing.
It works best with a separate Obsidian vault that you can fill it with content, e.g. via Obsidian Web Clipper.