steve

@374602362

everyoneeverydayeverythingeverywhereisok

China

Joined August 2014

989 Following

56 Followers

207 Posts

374602362 retweeted

Smartpig

@Smartpigai

4 days ago

终于有人把 Codex 的「从安装到真正干活」整理成一本免费开源中文橙皮书了。它不只是告诉你有哪些命令，而是试图讲清楚完整链路：如何开始、怎样和 Agent 协作，以及如何把 Codex 真正放进项目工作流。官方文档告诉你“它能做什么”，社区指南更像是在回答：“我到底该怎么用？” 值得收藏，边用边查👇 https://t.co/gV73HFJlEs

Smartpigai's tweet photo. 终于有人把 Codex 的「从安装到真正干活」整理成一本免费开源中文橙皮书了。

它不只是告诉你有哪些命令，而是试图讲清楚完整链路：如何开始、怎样和 Agent 协作，以及如何把 Codex 真正放进项目工作流。

官方文档告诉你“它能做什么”，社区指南更像是在回答：“我到底该怎么用？”

值得收藏，边用边查👇

https://t.co/gV73HFJlEs

319

115K

374602362 retweeted

yibie

@yibie

3 days ago

推荐这篇，交大和清华的团队系统测评了 12 种 Agent 记忆系统。不是那种"我们的模型更好"的论文，而是从数据管理的角度拆解记忆系统怎么选——什么时候该用 RAG、什么时候该用向量数据库、什么时候该用知识图谱。 Agent 的长期记忆怎么做？12 种记忆系统的系统对比论文：Are We Ready For An Agent-Native Memory System? 上海交大 × 清华 × MemTensor，6 月 23 日提交核心框架把 Agent 记忆拆成 4 个模块：表示与存储 → 提取 → 检索与路由 → 维护。每个模块独立评测。关键发现 • 没有一种架构在所有场景下都最优。记忆结构跟 workload 瓶颈的匹配程度决定了效果 • 局部维护（只更新受影响的部分）比全局重构省得多，效果也不差 • 12 个系统跑 5 个 benchmark（11 个数据集）的端到端评测，差异主要在 retrieval precision 和 long-horizon stability 上实操意义如果你在搭需要长期记忆的 Agent： • 简单问答场景 → 向量检索就��了 • 需要多跳推理 → 需要知识图谱层 • 频繁更新知识 → 局部维护优于全局重建论文：https://t.co/WxYZ2kOBAp 代码：https://t.co/SM4z8w0nKW #Agent #长期记忆 #系统评测

207

241

14K

374602362 retweeted

Magiccat（💜,💛）

@MagiccatMila

4 days ago

X推特上那些搬运博主的内容源终于知道从哪来的了！就这个AgentKey，一个工具通吃小红书、抖音、快手、B站、微博、知乎、Youtube和各种你意想不到的平台，公开的内容、评论、点赞、转发都能扒下来。最骚的是一句命令就能把它安装在Codex/龙虾或者Claude Code，直接建立自动化工作流跑流程，省下90%调研搬运时间。数据源还很稳定和安全，不用自己配置。 🔗https://t.co/35Zyr2s7JZ 对于新手，调研门槛降下来很多

MagiccatMila's tweet photo. X推特上那些搬运博主的内容源终于知道从哪来的了！

就这个AgentKey，一个工具通吃小红书、抖音、快手、B站、微博、知乎、Youtube和各种你意想不到的平台，公开的内容、评论、点赞、转发都能扒下来。

最骚的是一句命令就能把它安装在Codex/龙虾或者Claude Code，直接建立自动化工作流跑流程，省下90%调研搬运时间。

数据源还很稳定和安全，不用自己配置。

🔗https://t.co/35Zyr2s7JZ

对于新手，调研门槛降下来很多

137

288

92K

374602362 retweeted

Charly Wargnier

@DataChaz

3 days ago

🚨 A SENIOR ANTHROPIC ENGINEER JUST DROPPED AN 11-PAGE PDF ON LOOP ENGINEERING. The core shift: stop prompting the agent. Build the system that prompts it. Inside the autonomous loop: - Discover → Finds its own work (failing CI, open issues). - Isolate → Uses separate git worktrees to prevent collisions. - Verify → A second agent reviews the work. (Never let agents self-grade). - Persist → Writes to disk, not temporary context windows. - Schedule → Runs automatically on a timer. This is a great framework for building more reliable agentic systems link to the guide below. Read it, then check out this ace article on Loop Engineering by @akshay_pachaar 👇

DataChaz's tweet photo. 🚨 A SENIOR ANTHROPIC ENGINEER JUST DROPPED AN 11-PAGE PDF ON LOOP ENGINEERING.

The core shift: stop prompting the agent. Build the system that prompts it.

Inside the autonomous loop:

- Discover → Finds its own work (failing CI, open issues).
- Isolate → Uses separate git worktrees to prevent collisions.
- Verify → A second agent reviews the work. (Never let agents self-grade).
- Persist → Writes to disk, not temporary context windows.
- Schedule → Runs automatically on a timer.

This is a great framework for building more reliable agentic systems

link to the guide below.

Read it, then check out this ace article on Loop Engineering by @akshay_pachaar 👇

515

981

115K

Who to follow

Lived a few decades, got a few kids, worth a few & still love the haters. Learning to know nothing, gonna be young & stupid forever. Founder & CEO of something.

374602362 retweeted

Hanako

@hanakoxbt

3 days ago

An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents. Anthropic's own skill-creator hits 34% pass rate. This framework hits 71%. Generate → Test → Verify → Co-Evolve > Generate: after every task failure, the agent writes a candidate skill for what just broke. > Test: the new skill runs on a held-out set with the same frozen Claude model. > Verify: if it scores higher than the current best, it gets promoted. If not, it's rejected and the failure is logged. > Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving. The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic's own skill-creator by 37 points across SkillsBench and Codex. This is exactly why engineers stopped writing skills by hand and let the agent evolve them. Read the paper, then grab the setup below.

hanakoxbt's tweet photo. An MIT team just dropped a 24-page PDF on "Self-Evolving Skills" for Claude Code agents.

Anthropic's own skill-creator hits 34% pass rate. This framework hits 71%.

Generate → Test → Verify → Co-Evolve

> Generate: after every task failure, the agent writes a candidate skill for what just broke.

> Test: the new skill runs on a held-out set with the same frozen Claude model.

> Verify: if it scores higher than the current best, it gets promoted. If not, it's rejected and the failure is logged.

> Co-Evolve: a second agent learns from rejected attempts and evolves alongside the generator, so the loop keeps improving.

The result: 71.1% pass rate on Claude Opus 4.6, beating Anthropic's own skill-creator by 37 points across SkillsBench and Codex.

This is exactly why engineers stopped writing skills by hand and let the agent evolve them.

Read the paper, then grab the setup below.

243

319

25K

374602362 retweeted

老金

@freeman1266

5 days ago

Google 工程师放出的这篇 Loop Engineering 核心循环是 Act→Observe→Learn→Repeat：LLM 提出代码变换方案，编译器跑完反馈结果，再迭代。关键在于把编译器当 reward signal——不需要人类标注，不需要 RLHF，编译通过+性能提升就是自动奖励。 agent 做底层优化的路径比写业务代码清晰太多。 Pdf 链接在评论区获取👇 https://t.co/ykcajHL43Q

freeman1266's tweet photo. Google 工程师放出的这篇 Loop Engineering

核心循环是 Act→Observe→Learn→Repeat：LLM 提出代码变换方案，编译器跑完反馈结果，再迭代。

关键在于把编译器当 reward signal——不需要人类标注，不需要 RLHF，编译通过+性能提升就是自动奖励。

agent 做底层优化的路径比写业务代码清晰太多。

Pdf 链接在评论区获取👇

https://t.co/ykcajHL43Q

563

126

582

70K

374602362 retweeted

QingYue

@YuLin807

4 days ago

每次看到老外发的这种漂亮的动图，我都很疑惑他们是怎么制作出��的？！这看上去也不像视频，倒是更像一张图片。

341

547

106K

374602362 retweeted

darkzodchi

@zodchiii

3 days ago

A Stanford team just published the 16-page PDF on “How to structure an AI agent” Structure matters more than how you prompt it, and it's backed by hard numbers. Build → Reflect → Curate → Reuse • Build: the agent starts with a structured context, not a clever one-off prompt. • Reflect: it watches what actually worked during execution, no labels needed. • Curate: it folds those wins into an evolving playbook instead of a static prompt. • Reuse: the next run starts from that refined structure, getting stronger each time. This is exactly why senior engineers build the structure first in Claude Code, then let the agent run. Read the paper, then grab the setup below 👇

zodchiii's tweet photo. A Stanford team just published the 16-page PDF on “How to structure an AI agent”

Structure matters more than how you prompt it, and it's backed by hard numbers.

Build → Reflect → Curate → Reuse

• Build: the agent starts with a structured context, not a clever one-off prompt.

• Reflect: it watches what actually worked during execution, no labels needed.

• Curate: it folds those wins into an evolving playbook instead of a static prompt.

• Reuse: the next run starts from that refined structure, getting stronger each time.

This is exactly why senior engineers build the structure first in Claude Code, then let the agent run.

Read the paper, then grab the setup below 👇

309

340

25K

374602362 retweeted

诺鸭船长3

@noahduck283

3 days ago

很顶级的一篇文章不论是自己看一遍，或者是喂给 AI AI记忆问题都将大幅度改善

194

336

55K

374602362 retweeted

darkzodchi

@zodchiii

4 days ago

A senior Anthropic engineer just published the clearest blueprint on "How to give your AI agent a real memory" and it's a 15-page PDF. Write → Consolidate → Recall → Apply • Write: after every attempt, the agent records what it tried and what happened. • Consolidate: it distills those raw attempts into a few reusable lessons, not a transcript dump. • Recall: before the next task, it reads those lessons first. • Apply: it skips the dead ends it already learned, even on a brand new problem. This is exactly how engineers now build agent loops in Claude Code. Read the paper, then grab the setup below 👇

996

163

117K

374602362 retweeted

Raytar

@Raytar

4 days ago

Anthropic posted the best prompting lecture I've ever seen... and deleted it two days later. I watched the recording last night and kept pausing it. Each time I opened Claude to test what they showed. Two Anthropic engineers showed in 24 minutes how the Claude team actually uses it. Not tips. Not hacks. The way they actually talk to Claude. Every day. For real work. After 3 minutes you'll want to rewrite every prompt you've ever sent.

403

438K

374602362 retweeted

冬天

@seventhoce56019

7 days ago

分享一个比较危险的项目逆向工程以前是安全大佬的专属技能，现在有人把它做成了技能包，直接喂给 AI。项目叫 reverse-skill，核心思路很简单：只要塞一个routing.md进去，告诉 AI 遇到不同的安全任务该走哪条路。AI 拿到之后自动分诊，自己决定用什么工具、什么方法。覆盖 20 多个子技能方向：APK 逆向分析、IDA 静态分析、JS 前端逆向、固��安全、EDR 绕过、漏洞利用……基本上安全攻防常见场景都能覆盖。 20多个子技能，安全攻防的门槛被AI又踩平了一截。是好是坏，各位自己掂量着看。项目地址：https://t.co/tqd95pObVT

565

273K

374602362 retweeted

宇皓Jackson

@AGIJackson008

7 days ago

这妥妥的信息差啊我相信99%的程序员也不知道这种操作

533

132

198K

374602362 retweeted

Miles.Ma

@ma_zhenyuan

7 days ago

说实话，拿这个教程可以直接开个399的课程去卖了

316

470K

374602362 retweeted

Fenng

@Fenng

6 days ago

收而藏之。

205

302K

374602362 retweeted

Ray Dalio

@RayDalio

6 days ago

I recently spent a month in Asia, including 10 days in China, where I met with senior policy makers in several countries, and I found that over the past few months, there has been a big shift in the world order. I share my perspective in my latest article. As always, I welcome your questions and thoughts.

273

844

374602362 retweeted

Akshay 🚀

@akshay_pachaar

6 days ago

https://t.co/q4DvRIfStN

215

812K

374602362 retweeted

Movez

@0xMovez

5 days ago

A senior Google engineer just dropped a 19-page PDF on "Loop Engineering" for LLM and agentic systems. Act → Observe → Learn → Repeat • Act: the LLM proposes a code transformation (tile this loop, parallelize that one). • Observe: a compiler runs it and reports back - is it valid? faster? slower? by how much? • Learn: the LLM reads that feedback and adjusts its next move. • Repeat until it stops finding improvements. The agent gets smarter purely from grounded feedback inside its own context window. This 19-page PDF totally changed the way I’m building agentic systems today. Read it now, then explore the article below.

0xMovez's tweet photo. A senior Google engineer just dropped a 19-page PDF on "Loop Engineering" for LLM and agentic systems.

Act → Observe → Learn → Repeat

• Act: the LLM proposes a code transformation (tile this loop, parallelize that one).

• Observe: a compiler runs it and reports back - is it valid? faster? slower? by how much?

• Learn: the LLM reads that feedback and adjusts its next move.

• Repeat until it stops finding improvements.

The agent gets smarter purely from grounded feedback inside its own context window.

This 19-page PDF totally changed the way I’m building agentic systems today.

Read it now, then explore the article below.

664

661K

374602362 retweeted

Akshay 🚀

@akshay_pachaar

5 days ago

the four pillars of loop engineering. the loop itself is six lines, and nobody competes on it. every serious agent framework lands on the same tiny while-loop. model reads context, calls a tool, you feed the result back, repeat until it stops asking. so if that part is solved, what is everyone actually engineering? the answer is everything around the model. Boris Cherny, who built Claude Code, put it plainly. he doesn't prompt Claude anymore, he writes loops and lets them run. that shift has a name now, and it rests on four pillars that are harder than the six lines make them look. these are the parts that actually break: → knowing when to stop. a terminal message ends the turn, not the task. an agent will write failing code, glance around, and declare victory. "done" has to mean the tests pass, not the agent feeling good about its work. → keeping the context clean. long loops rot from the inside as old outputs and dead ends pile up. a worse context produces a worse decision, which adds more noise, and the agent gets dumber the longer it runs. you fight it by treating context as a budget, not a bucket. → tools the agent can actually use. pile on a hundred tools and it loses track of which one to reach for. writes have to be safe to repeat, because loops retry, and a retried "create customer" call leaves you with duplicate records. → something that can say no. left alone, an agent agrees with itself. the fix is to separate the maker from the checker so the worker never grades its own homework. put those four together and your job changes. you stop steering the agent move by move and start designing the system that steers it. Karpathy runs research loops overnight that tweak a script, test it, keep what works, and throw away what doesn't, with himself nowhere in the loop. he arranges it once and hits go. the model is becoming a commodity. the loop around it is where the real engineering lives now. the best builders stopped asking what they should tell the agent to do. they started asking what system would do this without them. I wrote the full breakdown. the article is quoted below. stay tuned for more on this!

akshay_pachaar's tweet photo. the four pillars of loop engineering.

the loop itself is six lines, and nobody competes on it. every serious agent framework lands on the same tiny while-loop. model reads context, calls a tool, you feed the result back, repeat until it stops asking.

so if that part is solved, what is everyone actually engineering?

the answer is everything around the model. Boris Cherny, who built Claude Code, put it plainly. he doesn't prompt Claude anymore, he writes loops and lets them run.

that shift has a name now, and it rests on four pillars that are harder than the six lines make them look. these are the parts that actually break:

→ knowing when to stop. a terminal message ends the turn, not the task. an agent will write failing code, glance around, and declare victory. "done" has to mean the tests pass, not the agent feeling good about its work.

→ keeping the context clean. long loops rot from the inside as old outputs and dead ends pile up. a worse context produces a worse decision, which adds more noise, and the agent gets dumber the longer it runs. you fight it by treating context as a budget, not a bucket.

→ tools the agent can actually use. pile on a hundred tools and it loses track of which one to reach for. writes have to be safe to repeat, because loops retry, and a retried "create customer" call leaves you with duplicate records.

→ something that can say no. left alone, an agent agrees with itself. the fix is to separate the maker from the checker so the worker never grades its own homework.

put those four together and your job changes. you stop steering the agent move by move and start designing the system that steers it.

Karpathy runs research loops overnight that tweak a script, test it, keep what works, and throw away what doesn't, with himself nowhere in the loop. he arranges it once and hits go.

the model is becoming a commodity. the loop around it is where the real engineering lives now.

the best builders stopped asking what they should tell the agent to do. they started asking what system would do this without them.

I wrote the full breakdown. the article is quoted below.

stay tuned for more on this!

314

171K

374602362 retweeted

Yanhua

@yanhua1010

7 days ago

目前看到关于 “Agentic Engineering Workflow”的最完整的介绍👇 花了一个小时完整看完了，完全可以做成一个付费教程。内容涵盖了tmux，agent记忆，skills，语音输入，长任务执行，并行worktree管理，多agent调度。还有让我眼前一亮的可视化html编辑器Lavish和一套代码变更校验的流水线: no-mistakes 感谢作者@kunchenguid分享，值得每一个用ai agent的人收藏、学习。

219

107K

steve

@374602362

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users