Karpathy's Agentic Engineering finally has proper tooling!
(built by Google)
Karpathy defined agentic engineering as the discipline that separates production agent work from vibe coding. The core skills he listed were spec design, eval loops, and security oversight.
The problem has been that practicing this still requires a different tool for every phase:
- editor for code
- a terminal for scaffolding
- a browser for testing
- a cloud console for deployment
- and a separate framework for evals.
Every transition is a context switch.
The solution to production-grade Agentic Engineering is now actually implemented in Google’s Agents CLI.
It covers the entire workflow in one place for scaffolding, evaluating, and deploying ADK agents.
One setup command injects 7 ADK-specific skills into a coding agent's context, which lets it handle scaffolding, evals, deployment, and enterprise registration through natural language.
I tested this end-to-end by building a RAG agent from scratch using Claude Code.
It scaffolded the full project from the ADK agentic_rag template, generated 20 eval scenarios with LLM-as-judge scoring, and returned a quantitative scorecard.
Finally, it also deployed everything to Agent Runtime and registered the agent to Gemini Enterprise, so the entire org can discover and use it.
The video below shows this in action, and I worked with the Google Cloud team to put this together.
Agents CLI GitHub repo → https://t.co/oOBGTVLKv8
(don't forget to star it ⭐ )
I wrote up the full build covering all six steps from install to enterprise registration.
It includes the eval scorecard, the instruction loophole the eval caught before deployment, and what the deployment process actually looks like end-to-end.
Read it below.
LLM-as-a-Judge explained in ~10 mins.
Knowing how to build AI verifiers and judges is one of the most important emerging AI skills today.
Here is a quick intro on the topic and where to learn how to apply LLM-as-a-Judge.
Claude Design 的小贴士很有意思:
### 第一部分:产品设计心法
1. 交互的本质
- EN: A prototype nobody clicks is just a painting.
- CN: 没人点击交互的产品原型,充其量只是一幅画罢了。
2. 设计的最高境界
- EN: The best design system is the one nobody notices.
- CN: 最好的设计系统,是自然到让人察觉不到它的存在。
3. 字体搭配
- EN: You cannot unsee a bad font pairing. Choose carefully.
- CN: 糟糕的字体搭配一旦入眼,就再也忘不掉了。挑选时请务必慎重。
4. 像素级克制
- EN: Every pixel argues for attention. Most should lose.
- CN: 屏幕上的每一个像素都在疯狂争夺用户的注意力,但绝大部分像素都应该学会“让步”。
5. 发布的意义
- EN: The fastest way to finish a design is to ship it.
- CN: 完成一个设计的绝对捷径,就是把它发布上线。
6. 留白的艺术
- EN: Whitespace is not empty. It is the silence between the notes.
- CN: 留白并不代表空洞。它是乐谱上音符之间那迷人的停顿。
7. 色彩法则
- EN: If you need more than three colors, you have zero colors.
- CN: 如果你觉得需要三种以上的颜色才能把控画面,那说明你完全失去了对颜色的掌控。
8. 核心规范
- EN: The user's mental model is the only spec that matters.
- CN: 用户的心理模型 (mental model),才是唯一真正重要的产品规范。
---
### 第二部分:日常实用小妙招 (Info)
9. 去除鞋底口香糖
- EN: Freeze gum with an ice cube for 2 minutes. It peels right off shoes.
- CN: 用冰块冷敷口香糖 2 分钟,就能把它从鞋底轻松撕下来。
10. 清洁微波炉
- EN: Microwave a damp paper towel for 30 seconds. Crud wipes right off the inside.
- CN: 把湿纸巾放进微波炉里加热 30 秒。微波炉内壁的顽固污垢立刻就能一擦就掉。
11. 拧开滑丝的螺丝
- EN: A rubber band over a stripped screw head gives enough grip to turn it.
- CN: 把一���橡皮筋垫在滑丝的螺丝头上,就能增加足够的摩擦力把它拧出来。
12. 修复木家具划痕
- EN: Run walnuts over scratched wood furniture. The oils fill the scratches.
- CN: 拿核桃仁在被划伤的木质家具上摩擦。核桃的天然油脂会完美填补那些划痕。
13. 面包保鲜秘诀
- EN: Store bread in the freezer. Toasting it from frozen tastes better than fresh.
- CN: 把面包存在冷冻室里。冷冻后再拿去烤,口感甚至比新鲜出炉的还要好。
14. 砧板除味
- EN: Rub a wooden cutting board with lemon and salt to deodorize it completely.
- CN: 用柠檬和盐摩擦木质砧板,可以彻底去除难闻的异味。
15. 防止沸水溢出
- EN: Put a wooden spoon across a boiling pot. It won't boil over.
- CN: 在沸腾的锅面上横放一把木勺,汤汁就不会溢出来了。
16. 炒出蓬松鸡蛋
- EN: Adding a splash of water instead of milk makes fluffier eggs. Milk makes them dense.
- CN: 炒鸡蛋时加一点水而不是牛奶,煎出来的鸡蛋会更蓬松。加牛奶反而会让鸡蛋变紧实。
---
### 第三部分:Claude 操作与交互技巧
17. 语音控制
- EN: Talk to Claude: tap ⌘G to start voice input, or hold Space in a comment to dictate.
- CN: 语音对话 Claude:按下 ⌘G 即可启动语音输入,或者在写评论时按住空格键直接口述。
18. 智能识图
- EN: Drop images here — they auto-attach to your next message as context.
- CN: 直接把图片拖到这里——它们会自动附加到你的下一条消息中,作为上下文(Context,即帮助 AI 理解你意图的背景信息)使用。
19. 快捷截图
- EN: ⌘V pastes screenshots straight from your clipboard into the chat view.
- CN: 按 ⌘V 可以将剪贴板里的截图直接粘贴到聊天界面中。
20. 代码库读取
- EN: Mount a local folder from the Import menu — Claude reads your codebase live, no copying.
- CN: 从“导入”(Import) 菜单挂载本地文件夹——Claude 能实时读取你的代码库 (codebase),再也不用你手动来回复制粘贴代码了。
21. 导入专业知识
- EN: Attach skills or reference design systems from the Import menu.
- CN: 从“导入”菜单中添加特定技能,或者引入你需要参考的设计系统。
22. 精准批注
- EN: Click "Comment" in the toolbar, then click any element to annotate it.
- CN: 点击工具栏上的“评论”(Comment),然后点击界面上的任何元素,就可以给它添加批注了。
23. 批量发送
- EN: Leave multiple comments before sending — they all batch into one message.
- CN: 你可以在发送前留下多条评论——它们会被打���合并成一条消息一起发给 Claude。
24. 原地修改文本
- EN: Text edit mode lets you click text in the preview and rewrite it in-place.
- CN: 开启文本编辑模式后,你只需在预览区域点击文字,就能原地修改它。
25. 输入框管理
- EN: Comments and text edits appear as chips in the composer. Remove any you don't want.
- CN: 你的评论和文本修改会变成输入框里的一块块小标签 (chips)。遇到不满意的,随时点叉删掉就行。
26. 实时 UI 调试
- EN: Knobs mode lets you drag-adjust CSS values live — sizes, colors, spacing. Use a prompt to control the UI.
- CN: 在“旋钮模式”(Knobs mode) 下,你可以像拧收音机旋钮一样,通过拖拽来实时调整 CSS(层叠样式表,网页外观设计的代码语言)数值,比如大小、颜色和间距。你也可以直接用提示词 (prompt) 来控制用户界面 (UI)。
27. 产品原型进��
- EN: "Prototype" starts at wireframes, moves to hi-fi, and ends as a working interactive app.
- CN: “原型”(Prototype) 功能带你走完产品设计的全流程:从简单的线框图开始,进化到高保真设计图,最终变成一个真正能点、能用的交互式应用。
28. 生成演讲稿
- EN: Turn on speaker notes when creating decks to get a full presenter script.
- CN: 让 AI 帮你制作幻灯片 (decks) 时,记得打开“演讲者备注”(speaker notes),这样你就能直接获得一份完整的演讲稿。
29. 沉淀工作流
- EN: Ask Claude to "save this as a template" — it packages the workflow for reuse.
- CN: 告诉 Claude“把这个保存为模板”——它就会把这套工作流 (workflow) 打包,方便你下次直接复用。
30. 多格式导出
- EN: The Share menu lets you export as PPTX, PDF, or a folder to give to Claude Code.
- CN: 通过“分享”(Share) 菜单,你可以把作品导出为 PPTX、PDF,或者打包成一个文件夹交给 Claude Code(Anthropic 推出的面向开发者的命令行 AI 编程助手)。
31. 无缝对接开发
- EN: "Handoff to Claude Code" creates a dev-ready package with specs and structure. Download it, then tell Claude Code "create this design."
- CN: “移交至 Claude Code”(Handoff to Claude Code) 功能会为你生成一个包含规范和结构的“开发就绪”数据包。下载它,然后直接告诉 Claude Code:“帮我把这个设计写成代码。”
32. 切换模型大脑
- EN: Use the Gear next to the Send button to change model.
- CN: 点击发送按钮旁边的齿轮图标,即可随时切换不同的大语言模型 (LLM)。
33. 原型内嵌 API
- EN: Claude can call the Claude API from inside your prototypes. No backend needed.
- CN: Claude 甚至可以直接在你的产品原型中调用 Claude API(应用程序编程接口,相当于连接 AI 大脑的通道)。完全不���要写任何后端代码!
34. 网页语音交互
- EN: Ask Claude to use the Web Speech API for interactive voice input and output.
- CN: 试试让 Claude 调用 Web Speech API(浏览器内置的语音合成与识别接口),让你的网页直接实现可以对话的语音输入与输出。
35. 手绘草图
- EN: The napkin sketch tool lets you draw freehand — great for rough layouts.
- CN: “餐巾纸草图”(napkin sketch) 工具允许你随心所欲地手绘涂鸦——非常适合用来勾勒粗略的排版布局。
36. 捕获真实网页
- EN: Import → Web Capture lets you copy elements from real web pages and paste them to Claude.
- CN: 使用“导入”→“网页捕获”(Web Capture),你可以直接从真实的网页上复制元素,然后原封不动地粘贴给 Claude。