YusenTheBot

@YusenTheBot

@CMU Robotics/AI

Joined April 2023

209 Following

5 Followers

35 Posts

YusenTheBot retweeted

Yanhua

@yanhua1010

4 days ago

目前看到关于 “Agentic Engineering Workflow”的最完整的介绍👇 花了一个小时完整看完了，完全可以做成一个付费教程。内容涵盖了tmux，agent记忆，skills，语音输入，长任务执行，并行worktree管理，多agent调度。还有让我眼前一亮的可视化html编辑器Lavish和一套代码变更校验的流水线: no-mistakes 感谢作者@kunchenguid分享，值得每一个用ai agent的人收藏、学习。

220

105K

YusenTheBot retweeted

rody

@0x_rody

15 days ago

https://t.co/LWQbDMXIdK

571

248K

YusenTheBot retweeted

Codez

@0xCodez

16 days ago

https://t.co/eJB3TgyJwV

109

940

19K

YusenTheBot retweeted

Lance Martin

@RLanceMartin

16 days ago

here's a few tips for designing self-correction loops and utilizing memory with Fable 5: https://t.co/gDUgA0R5vn

422

641

98K

Who to follow

Successful

@Chinonso04_

Entrepreneur | Custom Broker 📚. Failure Will Never Overtake You If Your Determination To Succeed Is Strong Enough.

YusenTheBot retweeted

Shann³

@shannholmberg

18 days ago

what is agent looping for the last two years we prompted agents one task at a time. that is starting to change instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up at its simplest, looping is one agent working on itself: > researches > drafts > checks the draft against a goal > fixes what is weak > runs that cycle again until the work clears the requirements you are not prompting each step anymore. the agent repeats the cycle for you the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end you create a goal, and the system runs the loop until it finishes within the reqs you set open and closed looping: OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine CLOSED LOOPING is bounded. a human designs the end-to-end path first: > clear goal > defined steps > an eval at each step > a point where it stops or hands back to you (and feeds back performance data) the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight. for most marketing work, closed is the one that pays off today. > the orchestrator owns the goal > the specialists own the steps > the subagents do the narrow work > an eval gate make sure its not slop

shannholmberg's tweet photo. what is agent looping

for the last two years we prompted agents one task at a time. that is starting to change

instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met

looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up

at its simplest, looping is one agent working on itself:

> researches
> drafts
> checks the draft against a goal
> fixes what is weak
> runs that cycle again until the work clears the requirements

you are not prompting each step anymore. the agent repeats the cycle for you

the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents

the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met

one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end

you create a goal, and the system runs the loop until it finishes within the reqs you set

open and closed looping:

OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out

this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time

the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine

CLOSED LOOPING is bounded. a human designs the end-to-end path first:

> clear goal
> defined steps
> an eval at each step
> a point where it stops or hands back to you (and feeds back performance data)

the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight.

for most marketing work, closed is the one that pays off today.

> the orchestrator owns the goal
> the specialists own the steps
> the subagents do the narrow work
> an eval gate make sure its not slop

199

699

10K

747K

YusenTheBot @YusenTheBot

20 days ago

YusenTheBot retweeted

Anatoli Kopadze

@AnatoliKopadze

28 days ago

Anthropic engineers finally showed how they actually use Claude Code internally 31 minutes of internal workflow that most Claude users will never see on their own here's what they cover: > how to set up project context files the right way > custom commands that save hours of repeated work > hooks that make Claude behave exactly how you need > subagents and how to actually spec them properly "your agent isn't the problem, your spec is" the people who understand how Claude Code actually works inside Anthropic are shipping things everyone else thinks requires a whole team that's exactly why I put together a breakdown of Claude features most people have never discovered you can find it below

209

309K

YusenTheBot @YusenTheBot

about 1 month ago

YusenTheBot retweeted

Dep

@0xDepressionn

about 1 month ago

Karpathy's 4 rules took coding accuracy from 65% to 94%. most devs haven't read them. the ones who did set up 21 rules total. 82,000 people on GitHub figured this out. you're looking at all 21. save this

0xDepressionn's tweet photo. Karpathy's 4 rules took coding accuracy from 65% to 94%.

most devs haven't read them.
the ones who did set up 21 rules total.
82,000 people on GitHub figured this out.

you're looking at all 21.

save this https://t.co/UWUTJjbLBa

135

222K

YusenTheBot retweeted

Jim Fan

@DrJimFan

about 2 months ago

I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy homework ;) And stay till the end, more easter eggs and predictions for your polymarket! 00:30 DGX-1 origin story at OpenAI, I was there in 2016 signing with Jensen and Elon. Heading to the Computer History Museum! 01:42 The Great Parallel 03:31 Robotics, the Endgame 03:39 Why VLAs fall short 04:32 Video world models as the 2nd pretraining paradigm 06:09 World Action Models (WAM) 07:46 Strategies for robot data collection and the FSD equivalent to physical data flywheel for robot manipulation 11:06 EgoScale and the Dexterity Scaling Law we discovered recently 14:00 Physical RL: bridging the last mile 15:39 DreamDojo: an end-to-end neural physics engine for scaling RL in silico 17:00 Civilizational Technology Tree and my predictions for the near future. Spoiler: it's closer than you think. Thanks to my friends at Sequoia for inviting me back to AI Ascent this year! I had a blast! Last year's talk is attached in the thread if you missed it.

202

559

605K

YusenTheBot retweeted

梭哈.AI

@SUOHA_AI

about 2 months ago

34秒的视频 —— 用 Claude Code 直接分析股票/财报 Claude Code现在支持原生接入 financial-datasets 官方 MCP 服务器，把股票实时价格、财务报表、SEC filings、新闻、加密货币价格等大量金融数据全部集成到一个插件里，接入了 17,000+ 只股票的实时价格.... 两步安装：打开终端里直接粘贴： claude mcp add --transport http financial-datasets 再打开claude code ，输入 /mcp 完成 OAuth 认证，用 claude mcp list 验证连接完成了! 你现在可以用 Claude Code 原生插件直接做股票分析、加密货币分析和基本面研究，直接提问即可以前要砸几万买彭博社终端，或者自己搭复杂 API ，现在？一个 MCP 命令搞定公司信息、收益/利润、财务指标、财务报表、内幕交易.......太舒服了

626

348

198K

YusenTheBot retweeted

AlexZ 🦀

@blackanger

3 months ago

Anthropic 发布了新的 Harness 工程实践文章。表面看是介绍他们的新产品，但从更宏观的角度看这篇文章，本质是在探讨一个问题：当模型能力持续变化时，Agent 系统到底该把什么做成“稳定接口”，把什么留给未来不断重写？看完文章，我觉得这篇文章隐藏了 Anthropic 对未来 Agent 基础设施的一个深度判断： Agent 基础设施会越来越像“微型操作系统（Agent OS）”，这是我觉得最值得重视的地方。 Agent 框架最忌讳的是把“暂时性的模型缺陷”上升为“永久性的系统结构”。 harness（调度循环、上下文整理、工具路由等）本质上是在编码对模型能力边界的假设，但是这些假设会随着模型变强而迅速过时。在我之前发布的《驾驭工程》（马书）里有句话：模型能力越强，harness 将会越简单。但是现在很多人写 Agent ，包括我自己，很容易把模型缺陷写到 Agent 框架里。比如，模型不会规划，就强行把步骤拆成固定 DAG 。。等等。这些其实都算是 harness 补丁，其实我们 agent 开发者也很难去判断哪些是模型缺陷。 Anthropic 的做法是设计一个 meta-harness：不去承诺具体 harness 长什么样，只承诺几类长期稳定的接口。这样的话，我们就不必猜测模型缺陷是什么。这其实和 OS 思路差不多，OS 从来不关心未来程序怎么写，它只是提供抽象接口。那么 Anthropic 这个 meta-harness 是如何给 Agent 做抽象呢？文章里最重要的抽象是三件事： - session，可以理解为事件日志 / durable state - harness，推理-调度循环 / 脑干 - sandbox，执行环境 / 手脚它们的分离，才是这个架构的核心创新点。 1. The session is not Claude’s context window 文章里这句话代表了 meta-harness 如何抽象 session。它不是简单的聊天记录，它代表的是「可恢复事件流」。 Anthropic 想做的是，把 session 做成一个 append-only event log，不能把它当作一个直接喂给模型的 prompt，它应该是一个可查询、可回放、可恢复、可重组的真实执行历史。如果你只是把 session 当作上下文窗口的镜像，那其实就表示它损失了可恢复性。 2. Harness：可替换的 orchestration layer 早期 Anthropic 把 harness、session、sandbox 都塞进一个容器里。但这样做遇到了很多问题，比如，harness 崩了，整个会话难恢复；容器挂了，状态可能丢失；调试困难，包含了用户数据无法轻易使用shell调试；VPC 接入困难。所以，他们把 harness 从容器里拿出来，变成一个“调用工具的脑干”。现在的 Harness : - 不再拥有状态 - 不再假设工具在哪里，你只能看到一个接口 execute(name, input) -> string （有种 nix 哲学了） Harness 这样抽象的意思是：AI 不需要知道它在哪台设备，哪个操作系统，手机还是电脑，容器还是虚拟机等等，它只知道“我可以使用哪些手”。（想象一下千手观音。。） 3. Sandbox：“某种具体的手” 文章里说，decouple the brain from the hands 也就是说： - Claude + harness 是 brain - sandboxes / tools / MCP / custom infra 是 hands 一旦 sandbox 只是 hand，就意味着它们彼此独立，可以来自不同的基础设施，可以共享和传递，也不需要每个 session 都加载启动完整的 sandbox。这直接引出后面的 “many brains, many hands”。这次 Anthropic 的实践来自于现代分布式系统最核心的经验之一：不要试图保住某个进程，要保住可恢复的事实记录和重启协议。这样设计，也从架构上增强了安全性。文章里有句话， narrow scoping is an obvious mitigation, but this encodes an assumption about what Claude can't do with a limited token—and Claude is getting increasingly smart. 意思是说： - 当然你可以给模型一个“范围较小的 token” - 但这其实还是在赌模型做不到某些攻击路径 - 而模型在持续变强，这种“它应该不会想到吧”的安全假设会越来越脆弱所以他们不要把 credentials 放在 sandbox 里。比如 Git token 只在初始化 clone/push/pull 过程中以受控方式接入，不让模型直接读到 token。模型通过 MCP proxy 间接调用，proxy 拿 session token 去 vault 取真实凭证再执行。不把安全建立在模型能力不足上。这样设计，也把“可恢复历史”从“上下文窗口”中解放出来了。它的理念是，不要把完整历史放在模型上下文里；把它放到 session 这个可查询对象里。从系统角度看： - Claude 的 context window 是执行现场 - session log 是证据仓库 - harness 是检索与重组器于是： - prompt 不再承担“永久记忆”职责 - trimming 不等于历史消失 - compaction 不等于事实不可恢复恢复时可以重新取原始事件，这是比“纯摘要 memory”高一个层级的设计。这样设计，也提升了性能。降低了 TTFT（time to first token），不再让每个 session 预付全部容器成本。现在，先让 brain 起跑，hand 只有在需要时才 provision。这是典型的 lazy materialization 思路。文章里给出的数据也很漂亮： - p50 TTFT 降约 60% - p95 降超过 90% 这个数字说明一个问题：原来瓶颈并非模型推理本身，而是把整个执行环境预耦合到了请求入口。最后，别忘记这样设计的初衷，就是为了增强未来 Agent 的可扩展性。文章最后说：“Many brains, many hands” 这意味着，Anthropic 未来是要做一个 agent runtime substrate。多脑协作/ 多手编排 / 跨环境执行。 Agent 的本体不应绑定某个执行壳，而应绑定一组可恢复状态与可调用能力。文章里也提到了这样设计的一些缺点，比如让 brain 管 many hands，本身是更难的 cognitive task。所以这个架构的前提是，模型智能已足够高，能承担更抽象的工具路由责任。这其实就是在赌未来模型能力一定会提升，面向未来设计。最后，这篇文章给我的启发有三条： - 启发 1：会话不是消息列表，而是“执行事实流”。 - 启发 2：工具环境不要内化为 agent 自身。 - 启发 3：上下文工程应该是 harness 可替换策略，要考虑未来一旦模型变强、或者检索策略变了不产生技术债。这篇文章没明说，但我认为它真正代表的是 Anthropic 的一个产品哲学：我们不相信今天的 agent harness 会是最终形态，所以我们优先投资于稳定接口，而不是一次性最优实现。 Agent 系统的未来，在于一组长期稳定的系统抽象之上。

868

182

160K

YusenTheBot retweeted

dontbesilent

@dontbesilent

3 months ago

今日最大收获：知道了 Michael Polanyi 这个人一个人名，顶十万字提示词

231

355K

YusenTheBot retweeted

Andrej Karpathy

@karpathy

3 months ago

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

60K

107K

21M

YusenTheBot retweeted

我真的没有拼多多

@nopinduoduo

3 months ago

大神karpathy 这篇长推我读完的反应是：他终于把「AI 时代知识管理」的原型说清楚了。核心不是 RAG，不是更聪明的搜索，而是让 LLM 成为知识库的唯一维护者——采集、编译、输出、linting、自我修复，全部自动化。人只负责两件事：投喂原料，和提出好问题。这中间的 gap，就是他说的 incredible new product。具体复现步骤：第一步：建一个「垃圾场」 raw/ 把你所有觉得有价值的东西无脑丢进去：网页、论文、截图、PDF、GitHub repo、播客转录。不需要分类。不要提前建文件夹。我现在用的是 Obsidian Web Clipper 一键剪藏，连图片一起下载到本地。第二步：让 LLM 当图书管理员给 LLM 的核心指令只有两句话：「阅读 raw/ 里的所有文件，生成一个结构化的 wiki。要求：每份原始文件一篇摘要，提取概念并写成独立文章，然后互相做 backlink。」放心丢过去，最新 LLM 的 wiki 结构能力比人类强。第三步：把 Obsidian 当 IDE 用不要拿 Obsidian 写笔记，拿它当前端看板。 raw/ 是原始数据，wiki/ 是 LLM 编译后的产物，output/ 是你的查询结果。三个目录，天然的分层。第四步：开始「对话式」研究你的知识库大了之后，提问方式要变。不要问「这篇文章说了什么」，要问：「帮我对比 A 和 B 的差异，所有结论必须引用 wiki 原文并标注来源。」然后让 LLM 不直接回答你——让它生成一份 .md 报告，或者 Marp 幻灯片，或者 matplotlib 图表。第五步：强制回流（最关键的一步）任何一次查询的结果，都必须重新存回 wiki。这样你的每次探索都会沉淀下来，知识库只会越查越厚。第六步：定期让 LLM lint 你的 wiki 给 LLM 的指令：「通读整个 wiki，找出： 1. 互相矛盾的数据 2. 缺失的中间环节 3. 可以写新文章的概念关联」这是 human 做不到、LLM 很擅长的维护工作。

211

175K

YusenTheBot @YusenTheBot

3 months ago

Vector OS nano 让你的agent长出血肉通过 mcp接入claude code demo，agent可以随时操作机械臂，泛化自然语言操作。最重要的是，agent拥有智商，可以自己学习新的skills！视频里的wave 是agent zeroshot自己设计并且使用的。早期demo，欢迎一起开发开源地址：https://t.co/nODcfyHiD8

YusenTheBot @YusenTheBot

3 months ago

给openclaw🦞做的dashboard，内置对话窗口可以直接截图复制粘贴，也可以直接拖文件进去。实时监控 token 消耗、成本、cron 任务、sub-agent 运行情况 Go 写的，单二进制，部署一行命令搞定。开源免费。感觉有用的话点个star支持一下，十分感谢开源地址： https://t.co/sB5eBRq4Hl

YusenTheBot's tweet photo. 给openclaw🦞做的dashboard，内置对话窗口
可以直接截图复制粘贴，也可以直接拖文件进去。

实时监控 token 消耗、成本、cron 任务、sub-agent 运行情况
Go 写的，单二进制，部署一行命令搞定。开源免费。
感觉有用的话点个star支持一下，十分感谢
开源地址：
https://t.co/sB5eBRq4Hl https://t.co/gskvKLQaux

YusenTheBot @YusenTheBot

4 months ago

@steipete @AskPerplexity @gork what’s going on here

YusenTheBot @YusenTheBot

4 months ago

@AnthropicAI You have persecutory delusion lol

YusenTheBot retweeted

Yanhua

@yanhua1010

4 months ago

分享原作者的OpenClaw配置： workspace/ ├── https://t.co/Ng8qYhyHZQ ├── https://t.co/swTDET5nKk ├── https://t.co/tAJOY3IvSW ├── https://t.co/MbwOvhraQi └──skills/ IDENTITY: https://t.co/BNoVLlmwoA SOUL: https://t.co/O9RT5oqGBL PRD: https://t.co/2RVym9MjSu

312

197K

YusenTheBot

@YusenTheBot

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users