LOVE•YOU•JIN•HAO

@ihsenots

A high tech professional. Strong curiosity, reading, thinking

Joined July 2020

3.6K Following

186 Followers

3.6K Posts

ihsenots retweeted

Deli Chen

@victor207755822

2 days ago

🧵 Deli AutoResearch SKILL is now officially open source! 🎉 https://t.co/V3lwwdyQm8 Alongside it, we’re dropping our 4th survey paper — this time on Self-play. https://t.co/SEb2qoKCI6 Inspired by AlphaZero, we got a powerful insight: prior knowledge doesn’t always lift the ceiling. Models can discover more globally optimal solutions just by playing against themselves. The biggest change in this paper? For the first time, the AutoResearch Agent autonomously planned GPU experiments — and submitted actual RL runs on the DeepSeek 285B model. The entire RL pipeline — experiment design, code writing, running, debugging, and conclusion summarization — was 100% automated, with zero human intervention from me. This was incredibly difficult, but an incredibly important step. https://t.co/kuZZNux5RH GRPO is the tool being called by the AutoResearch Agent here. We see this as the beginning of our Continual Learning research journey. 🚀 As always, this is my personal research project, unaffiliated with any organization. All views are my own. #AI #ReinforcementLearning #SelfPlay #OpenSource #AutoML #ContinualLearning #DeepSeek

186

310K

ihsenots retweeted

Phoenix Yin

@Phoenixyin13

about 22 hours ago

💥人类的精力将从琐碎中彻底解放出来！ DeepSeek 资深研究员Deli Chen开源的这个 Deli AutoResearch SKILL 项目，它所展示的全自动科研闭环和AI 自我博弈的能力，给我带来了巨大的震撼。通过规范化长周期任务的逻辑、设定Anti-Loop和Heartbeat Watchdog等元规则，大模型自己就能充当编译器和执行器。未来的程序员，更多是在扮演立法者与流程架构师的角色。根据Deli所言，AI 通过自我博弈在没有人类干预的情况下自主规划 GPU 实验并通过 GRPO 算法进行强化学习，最终在模拟同行评审中拿到 8.6/10 的高分。这说明 AI 正在从学习人类现有的知识跨越到通过自我试错去探索人类未知的知识边界。科学研究的效率可能会迎来指数级爆炸。近几个月，许多人在调侃现阶段的 AI Agent 只能做几步简单的任务，稍微时间长一点就会迷失自我或者陷入死循环。 Deli的解法非常具有工业参考价值。他把传统分布式系统、操作系统里的概念，比如 Watchdog、持久化、多角色模拟，搬到了 Agent 协议里。让 Agent 真正干大事，尤其连续工作 10 小时、迭代 60 轮这种情况，必须引入成熟的工程防御机制，去对抗 AI 的随机性和幻觉。 AI-Native的科研路径不仅可行，而且已经跑通。

181

269

31K

ihsenots retweeted

LinearUncle

@LinearUncle

2 days ago

挖到了一个有意思的项目 💡 当你 Codex 额度用完，但 ChatGPT 网页版的 GPT-5.5 Thinking 还能用——这个项目的思路就是：把网页端的额度也榨干，继续写代码。核心原理很简单：在 pi agent 之上包了一层 MCP Server，然后在 ChatGPT 网页端连接这个 MCP Server，网页端就能直接调用本机的 pi agent 继续编程了。为了让外网能访问本机服务，需要做内网穿透。我用了 ngrok，按官网配置就行，更推荐 Cloudflare Tunnel。本质上就是一个内置了 pi SDK agent 能力的 MCP Server，支持本地文件的增删改查，通过公网暴露后，ChatGPT 网页端配置一下就能调用。建议人群：Geek / 想把 ChatGPT 订阅用到极致的朋友。 https://t.co/KEnWunNZ3M

LinearUncle's tweet photo. 挖到了一个有意思的项目 💡

当你 Codex 额度用完，但 ChatGPT 网页版的 GPT-5.5 Thinking 还能用——这个项目的思路就是：把网页端的额度也榨干，继续写代码。

核心原理很简单：在 pi agent 之上包了一层 MCP Server，然后在 ChatGPT 网页端连接这个 MCP Server，网页端就能直接调用本机的 pi agent 继续编程了。

为了让外网能访问本机服务，需要做内网穿透。我用了 ngrok，按官网配置就行，更推荐 Cloudflare Tunnel。

本质上就是一个内置了 pi SDK agent 能力的 MCP Server，支持本地文件的增删改查，通过公网暴露后，ChatGPT 网页端配置一下就能调用。

建议人群：Geek / 想把 ChatGPT 订阅用到极致的朋友。

https://t.co/KEnWunNZ3M

776

124K

ihsenots retweeted

Jamin Ball

@jaminball

1 day ago

Great read!

924

365K

Who to follow

ihsenots retweeted

1 day ago

Anthropic research lead: "99% of our engineers are running swarms of 300+ self-improving agents. close the agent loop. Give the model a way to verify its own output" in a 20-minute session, Anthropic team member explains how to build a model that improves itself. Claude + loops + plan mode + dynamic workflows -that’s the secret. Watch the talk, then save the playbook below.

221

331K

ihsenots retweeted

Austin

@austinit

3 days ago

嘿，朋友们！强烈安利 CodeGraph：把整个代码库变成结构化知识图谱的神器！用 Tree-sitter 精准解析 AST，支持 20+ 语言，直接喂给 Claude/Cursor 等 AI Agent。改代码前秒看影响范围，上下文准到爆炸。实测 token 省 16%，工具调用砍 58%，全程本地超安全。一条命令启动： npx @colbymchenry/codegraph 重度 AI 写代码的必备！ https://t.co/OUIrHciTQ8 🚀 值得一试！

austinit's tweet photo. 嘿，朋友们！

强烈安利 CodeGraph：把整个代码库变成结构化知识图谱的神器！

用 Tree-sitter 精准解析 AST，支持 20+ 语言，直接喂给 Claude/Cursor 等 AI Agent。

改代码前秒看影响范围，上下文准到爆炸。实测 token 省 16%，工具调用砍 58%，全程本地超安全。
一条命令启动：
npx @colbymchenry/codegraph

重度 AI 写代码的必备！
https://t.co/OUIrHciTQ8

🚀 值得一试！

170

206

14K

ihsenots retweeted

𝚁𝚎𝚋𝚎𝚕 @rebel0x0

1 day ago

@0x404page Here you go

ihsenots retweeted

莱特卡卡

@litekakacom

2 days ago

太快了，Skill还没搞明白多少，loops集合站都已经出来了，以后都不用装skill了，直接装loops就好了：https://t.co/V28t4dogph。这个站已经收集了40个loops，你可以复制用于coding agents的闭环工作流。每个闭环都包含触发器（triggers）、反馈门（feedback gates）和退出条件（exit conditions），以便代理能够自我调整节奏，直到任务完成。

litekakacom's tweet photo. 太快了，Skill还没搞明白多少，loops集合站都已经出来了，以后都不用装skill了，直接装loops就好了：https://t.co/V28t4dogph。

这个站已经收集了40个loops，你可以复制用于coding agents的闭环工作流。

每个闭环都包含触发器（triggers）、反馈门（feedback gates）和退出条件（exit conditions），以便代理能够自我调整节奏，直到任务完成。

650

127

49K

ihsenots retweeted

思维怪怪

@0xLogicrw

1 day ago

DeepSeek 资深研究员陈德里开源了个人项目 Deli AutoResearch SKILL，并发布了由智能体完全自主撰写的第四篇综述论文。项目以单一的 SKILL.md 协议文件形态呈现，本身不含可执行代码。协议通过规约长周期任务中的状态持久化、防死循环（Anti-Loop）与心跳守护进程（Heartbeat Watchdog），指导 AI 智能体调用子智能体（Subagent）与多角色模拟机制，实现科研全流程的全自动协作。同时发表的自我博弈综述论文展示了智能体在实验阶段的突破：在零人类干预下，智能体首次自主规划了 GPU 实验，并在 285B 参数量的 DeepSeek 模型上运行强化学习（RL）训练任务。通过 GRPO 算法，智能体完成了从实验设计、编写代码，到运行调试和总结结论的完整研究闭环，并在模拟同行评审中跑出了 8.6/10 的高分。此前，陈德里已利用同样的方法自主产出过三篇学术综述。首篇关于自主科研智能体的论文经历了约 60 轮智能体迭代，总耗时约 10 小时，实现了从 V1 到 V5 的迭代演进。整套工作流不仅降低了长周期研究的人为操作成本，也验证了 AI 原生研究路径的可行性。

926

153

160K

ihsenots retweeted

huangserva

@servasyy_ai

2 days ago

https://t.co/Ted77vDmKl

119

267

18K

ihsenots retweeted

陈成

@chenchengpro

2 days ago

Factory 2 看起来把 Loop 的工程化做的已经很好了。 - Factory 创始人 Matan Grinberg 宣布 Factory 2.0，一句话定调：提升单个工程师的效率已经不够了。真正要解锁的是组织级生产力，而它需要的不是更快的代码补全，而是一个互联、agent 原生、端到端、且能通过观察自身而持续改进的系统，最小增量单元是 AI agent。他给这个系统起名叫「软件工厂」。软件工厂与其说是新工具，不如说是一种系统拓扑：从外部信号出发（bug 报告、内部对话、客户反馈、业务需求）→ triage 分诊成计划内变更 → 被构建/测试/评审/加固/发布/监控 → 监控又产出新信号，整条是一个连续反馈闭环，闭环本身就是产品。作者的判断是：几乎没人把这条 loop 真正做成了全 AI 驱动，现在还很早，但扩散会很快。它给「robust 的软件工厂」立了三根硬支柱。一是 Model Independence：没有单一模型适配企业全部需求，要能为不同任务刻意选模型，或用 Router 按 cost/performance/speed 自动或按规则选「最佳」模型，对冲模型商品化带来的成本与能力变化。二是 Sovereign Intelligence 主权智能：部署形态从全托管云、自带密钥、自托管 data plane、EU 专属一直到完全 air-gapped 无外网；但主权的重点不在「在哪运行」，而在拥有一个从自身学习的系统，每次 agent 会话、代码评审、已解决的事故都回流进 loop，能力永远留在你的墙内。三是 Continual Learning：SDLC 每个阶段都要 instrument，代码评审/安全分析/文档/QA/事故响应跑在同一平台、共享同一个 agent core 和 router 和组织上下文，于是安全发现能反哺代码评审、部署能触发文档更新、事故能关联回引发它的那个 PR。这些不停留在概念，已在 NVIDIA、EY、Adobe、Palo Alto Networks、Adyen、Blackstone、Wipro、Comarch 等组织的生产环境运行。自治被做成一个谱系，而不是一个开关：well-defined 任务用简单 Droid agent 或 skill，周期性工作流用带共享目标与记忆的 Automations，远程持久执行用 Droid Computers，复杂任务用 Missions 把工作拆成并行轨道跨小时数天解。最后落在「人」上：工程师不再是造软件的唯一守护者，而是要去建造那座造软件的工厂，随之承担治理、安全与业务结果的所有权，下一个时代是 engineering-led。

ihsenots retweeted

陈成

@chenchengpro

2 days ago

最近大家都在聊 agent 的「loop」，但很少人讲清它到底是什么。Warp CEO Zach Lloyd 给了一个能落地的版本：让 Skill 从反馈里自我进化的双层循环，以 GitHub issue 三分类为例。内循环：每来一个新 issue，GitHub Action 触发云 agent 跑 triage Skill，自动分到 ready-to-implement / needs-info / duplicate 三档，打标签并发一条带隐藏标记 oz-triage v:N 的评论，求 👍/👎。外循环：每天一个定时 agent 拉取近 14 天所有被分类的 issue，收集三类信号，评论赞踩、人工纠正回复，还有「人把标签从 ready 改成 needs-info」这种标签漂移（最强 ground truth）。然后把信号提炼成可泛化规则，比如别盯着单个 issue 改，而是写成「崩溃报告缺 OS 版本号一律归 needs-info」，再塞进 Skill 的 Learned guidelines 段、版本号 +1，开 PR 让人 review 合并，永不自动改 main。要点就一句：Skill 就是文件，改进 = 对文件做 diff；反馈天然藏在 issue 标签和评论里，零额外标注成本。同样适用于 code review、bug 修复、事件响应；目标明确时可用自动 grader 替代人工。Warp 已用它管理自家开源仓库并开源了框架（oz-for-oss）。

547

115K

ihsenots retweeted

AlphaSignal AI

@AlphaSignalAI

4 days ago

https://t.co/O9xexVXVps

123

222

11K

ihsenots retweeted

恒星

@vintcessun

7 days ago

原来还能这么做：把 OpenAI、Anthropic、Google 等十几家 LLM 提供商的接口统一成一个，切换模型只改一个字符串就行。核心就靠 provider:model 路由加适配器，没有黑魔法，但开发体验直接从“翻文档”变成了“改个前缀”。这种抽象层思路比工具本身更值得琢磨。 https://t.co/4Aiv87bYFm

104

ihsenots retweeted

余温

@gkxspace

7 days ago

如果你的 Codex /goal 跑一晚上，第二天一看烂尾了，可能真不怪 Codex。。。十有八九是 goal 写得太辣鸡了，比如“帮我做个 App”、“修一下 bug"，这种话人能听懂，agent 不知道怎么验证、重试几次、啥时候该停。乔木老师把他那篇 4 万字的 Goal 文档做成了一个 skill：你说一句话需求，它给你翻译成一份完整的任务合同，验证方式、约束、写入边界、迭代策略、完成条件、暂停条件全部补齐，复制就能跑。 npx skills add joeseesun/qiaomu-goal-meta-skill 装完之后，你就可以踏实睡觉了💤

497

121

660

82K

ihsenots retweeted

小盖

@xiaogaifun

8 days ago

https://t.co/ppmgcLVPzD

116

181

16K

ihsenots retweeted

Bohu

@BohuTANG

10 days ago

上周六分享了「Trace 即 Evals」，聊了一个问题：Agent 改了 prompt、换了模型、加了 tool，到底变好还是变差？几个关键点： - Agent 是链式反应，一步偏了后面全偏，只看 pass/fail 没用 - 同任务同模型，换 harness，token 消耗差 3 倍，成本差 67% - 轨迹存下来才有归因的可能——哪一步选错 tool、哪一步上下文炸了，展开就能看到 - Anthropic、OpenAI这种头部模型公司迭代 agent 靠的就是 trace 驱动的量化闭环，这套方法不该只有大厂能用 Slides 👉 https://t.co/KIcq8KuVTD

108

130

19K

ihsenots retweeted

Cander

@Cander_zhu

10 days ago

昨天还在写：从 Prompt Engineer 升级成 Loop Architect，本质是把自己从「打字的人」变成「设计系统的人」。今天看到 @PandaTalk8 这篇《LOOP ENGINEERING：当工程回归哲学》，更像是给这套思路补上了「底层哲学」那一块： while 循环 + 自然语言 Prompt 就能撑起一个智能体循环是智能的结构，Prompt 是目标函数 + 价值判断难点不在代码，而在回答三个问题：什么算完成？（目的论）什么是好？（价值论）哪些交给机器定义做决定，哪些必须还是有独立个性思维的人来掌控？（边界与责任）站在一线工程视角，Loop Engineering 落地起来也许就是：用最简单的代码搭循环，用最长的时间去迭代想清楚「目标、约束、价值」。代码在变简单，思考在变重要。这条路上，工程师和哲学家，可能真要在同一张桌子上重新坐一坐了。

130

22K

ihsenots retweeted

rody

@0x_rody

9 days ago

https://t.co/LWQbDMXIdK

570

247K

ihsenots retweeted

JUMPERZ

@jumperz

9 days ago

this is how to run claude fable 5 as your architect ( 20$ sub only ) + gpt 5.5 codex as your builder.. full system below: the loop is : fable thinks... codex builds , the repo remembers and you judge, that simple.. the point of all this is that we are taking advantage that 5.5 is on a sub and it's fast enough, especially with /goal, and we using latest Anthropic model to be the judge/guidance.. step 1 >create the memory (one time): make docs/HANDOFF.md in your repo. >codex updates it after every work session: what was built, what was decided + why, open disagreements, next slice. this file is why 30 min of fable is enough ..it reads state instead of asking you questions. step 2 paste this to fable (every session) >you are the ARCHITECT for [project] >gpt 5.5 codex is the BUILDER >you never write implementation code. >your jobs: (1) read the handoff below (2) rule on every disagreement the builder raised: accept/reject/modify + one line why (3) judge any results RAW against the gates in the docs and ignore the builder's narrative (4) write the next slice spec: small enough for one PR, hard acceptance criteria, explicit out-of-scope, and force the builder to verify APIs/formats against reality before coding (5) flag scope creep and goalpost-moving.. be blunt. disagree with me. end with a paste-ready block for the builder. step 3 paste fable's block to codex with this /goal /goal: execute the architect spec. rules: PHASE 0 before any code, reply with your plan + every disagreement you have, with reasons, citing real files in the repo. silent compliance = failure. silent scope additions = failure. PHASE 1 freeze shared contracts (schemas/interfaces) in docs/ first; after freeze they're read-only for everyone including you. PHASE 2 spawn max 3-4 lane agents on modules that don't import each other, plus ONE reviewer agent that never writes feature code: it checks every lane against the spec + tests + frozen docs and returns APPROVE or a numbered defect list. nothing merges without approve. then: commit + push each slice, update docs/HANDOFF.md with raw results only tables and numbers, no interpretation, no 'promising'. verdicts belong to the architect and the human." step 4 repeat codex works hours.. you spend fable minutes on judgment only: arbitration, evidence review, next specs, kill/continue calls. one fable session per work block. the 5 rules that make it actually work >repo docs are the memory not in HANDOFF.md = didn't happen >the builder never grades its own work >disagreement is mandatory >freeze success criteria BEFORE results exist, never edit after >spend architect time on judgment, builder time on typing >the architect is the edge and the builder is the hands. the repo is the brain.. think of it that way.. bookmark this. you will need it.. you really wont need to pay hundreds in API tokens if you do this way

jumperz's tweet photo. this is how to run claude fable 5 as your architect ( 20$ sub only ) + gpt 5.5 codex as your builder..

full system below:

the loop is : fable thinks... codex builds , the repo remembers and you judge, that simple..

the point of all this is that we are taking advantage that 5.5 is on a sub and it's fast enough, especially with /goal, and we using latest Anthropic model to be the judge/guidance..

step 1

>create the memory (one time): make docs/HANDOFF.md in your repo.

>codex updates it after every work session: what was built, what was decided + why, open disagreements, next slice. this file is why 30 min of fable is enough ..it reads state instead of asking you questions.

step 2 paste this to fable (every session)

>you are the ARCHITECT for [project]

>gpt 5.5 codex is the BUILDER
>you never write implementation code.
>your jobs:

(1) read the handoff below
(2) rule on every disagreement the builder raised: accept/reject/modify + one line why
(3) judge any results RAW against the gates in the docs and ignore the builder's narrative
(4) write the next slice spec: small enough for one PR, hard acceptance criteria, explicit out-of-scope, and force the builder to verify APIs/formats against reality before coding
(5) flag scope creep and goalpost-moving.. be blunt. disagree with me. end with a paste-ready block for the builder.

step 3 paste fable's block to codex with this /goal

/goal: execute the architect spec. rules:

PHASE 0 before any code, reply with your plan + every disagreement you have, with reasons, citing real files in the repo. silent compliance = failure. silent scope additions = failure.

PHASE 1 freeze shared contracts (schemas/interfaces) in docs/ first; after freeze they're read-only for everyone including you.

PHASE 2 spawn max 3-4 lane agents on modules that don't import each other, plus ONE reviewer agent that never writes feature code: it checks every lane against the spec + tests + frozen docs and returns APPROVE or a numbered defect list. nothing merges without approve. then: commit + push each slice, update docs/HANDOFF.md with raw results only tables and numbers, no interpretation, no 'promising'. verdicts belong to the architect and the human."

step 4 repeat codex works hours.. you spend fable minutes on judgment only: arbitration, evidence review, next specs, kill/continue calls. one fable session per work block.

the 5 rules that make it actually work

>repo docs are the memory not in HANDOFF.md = didn't happen

>the builder never grades its own work

>disagreement is mandatory

>freeze success criteria BEFORE results exist, never edit after

>spend architect time on judgment, builder time on typing

>the architect is the edge and the builder is the hands. the repo is the brain.. think of it that way..

bookmark this. you will need it.. you really wont need to pay hundreds in API tokens if you do this way

182

230K

LOVE•YOU•JIN•HAO

@ihsenots

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users