Johnny He

@hejohnnyu

science

Beijing

Joined September 2016

281 Following

8 Followers

96 Posts

hejohnnyu retweeted

Xudong Han

@Xudong07452910

3 days ago

这篇论文很适合所有重度使用 Claude Code、Codex 或者其他AI Agent 的人看。它研究的不是 Agent 在 benchmark 上怎么失败，而是一个更真实的问题：在真实开发里，AI coding agent 到底是怎么惹开发者生气的？论文分析了 20,574 个真实 coding-agent sessions，覆盖 IDE 和 CLI 工作流。它把“失败”定义得很有意思：不是单看代码有没有跑通，而是看开发者什么时候开始纠正、打断、反驳 Agent。结果挺真实，最常见的问题居然不是“代码写错”，而是 Agent 违反开发者明确说过的约束。比如你说“别改这个文件”“先别动代码”“只做最小修改”，它还是忍不住多做一点；你让它解释问题，它顺手开始改代码；你让它验证完再说完成，它没跑完就开始汇报胜利。论文里还有一个很有意思的差异：CLI Agent 更容易违反约束，因为它经常被委托去做更长、更开放的任务；IDE Agent 更容易出现局部实现错误，因为它更像贴身 copilot，在高频交互里改代码。最扎心的是，很多失败并不会立刻造成灾难性后果，而是消耗开发者的时间和信任。你得不断判断它有没有听懂、有没有越界、有没有真的验证过。这和我自己的体感很像：AI coding 真正让我感到累的地方是得一直判断它有没有听懂、有没有越界、有没有真的验证过。所以我期待的 coding agent 的进步，不能只看代码能力，还要看它能不能持续对齐开发者意图、遵守边界、准确汇报进度。 AI coding 的难点，可能不是“写得更快”，而是“别让我反复擦屁股”。 https://t.co/MXcgS43Kwz

Xudong07452910's tweet photo. 这篇论文很适合所有重度使用 Claude Code、Codex 或者其他AI Agent 的人看。

它研究的不是 Agent 在 benchmark 上怎么失败，而是一个更真实的问题：

在真实开发里，AI coding agent 到底是怎么惹开发者生气的？

论文分析了 20,574 个真实 coding-agent sessions，覆盖 IDE 和 CLI 工作流。它把“失败”定义得很有意思：不是单看代码有没有跑通，而是看开发者什么时候开始纠正、打断、反驳 Agent。

结果挺真实，最常见的问题居然不是“代码写错”，而是 Agent 违反开发者明确说过的约束。

比如你说“别改这个文件”“先别动代码”“只做最小修改”，它还是忍不住多做一点；你让它解释问题，它顺手开始改代码；你让它验证完再说完成，它没跑完就开始汇报胜利。

论文里还有一个很有意思的差异：CLI Agent 更容易违反约束，因为它经常被委托去做更长、更开放的任务；IDE Agent 更容易出现局部实现错误，因为它更像贴身 copilot，在高频交互里改代码。

最扎心的是，很多失败并不会立刻造成灾难性后果，而是消耗开发者的时间和信任。你得不断判断它有没有听懂、有没有越界、有没有真的验证过。

这和我自己的体感很像：AI coding 真正让我感到累的地方是得一直判断它有没有听懂、有没有越界、有没有真的验证过。

所以我期待的 coding agent 的进步，不能只看代码能力，还要看它能不能持续对齐开发者意图、遵守边界、准确汇报进度。

AI coding 的难点，可能不是“写得更快”，而是“别让我反复擦屁股”。

https://t.co/MXcgS43Kwz

212

257

19K

hejohnnyu retweeted

qiyan | Crypto

@0xQiYan

4 days ago

还在手动画架构图？拖来拖去改半天？收藏！今天必须安利这个skill——我最近装了个 `drawio-skill`，一句话就能生成专业图表，再也不用自己画了。它的逻辑特别简单：你只用说人话（比如“画一个交易系统架构图”），它直接给你生成 `https://t.co/MtN94b7q0U` 文件。支持 6 种常用图：架构图、流程图、ER 图、UML、ML 模型图、时序图。还能自动导出 PNG/SVG/PDF，自动帮你检查+修复问题，不满意就多轮迭代，改到你觉得“嗯，行了”为止。我自己试了一下：让它画个交易系统架构图，几秒钟就出来了，质量高到可以直接拿去用。以后跟别人解释复杂系统，再也不用边比划边画了——一句话甩过去，图就出来，至少省 90% 的画图时间。哦对了，我毕设的模型图也是用这个 skill 画的，就下面这张 GitHub 在这里，赶紧收藏安装 https://t.co/1uPH6fMqDs

0xQiYan's tweet photo. 还在手动画架构图？拖来拖去改半天？

收藏！今天必须安利这个skill——我最近装了个 `drawio-skill`，一句话就能生成专业图表，再也不用自己画了。

它的逻辑特别简单：你只用说人话（比如“画一个交易系统架构图”），它直接给你生成 `https://t.co/MtN94b7q0U` 文件。

支持 6 种常用图：架构图、流程图、ER 图、UML、ML 模型图、时序图。

还能自动导出 PNG/SVG/PDF，自动帮你检查+修复问题，不满意就多轮迭代，改到你觉得“嗯，行了”为止。

我自己试了一下：让它画个交易系统架构图，几秒钟就出来了，质量高到可以直接拿去用。

以后跟别人解释复杂系统，再也不用边比划边画了——一句话甩过去，图就出来，至少省 90% 的画图时间。

哦对了，我毕设的模型图也是用这个 skill 画的，就下面这张

GitHub 在这里，赶紧收藏安装
https://t.co/1uPH6fMqDs

489

119

518

30K

hejohnnyu retweeted

Amto

@XAMTO_AI

5 days ago

兄弟们，今天随手一刷，看到个东西直接把我干懵了。 GitHub上有个叫 agency-agents 的开源角色库，你知道它现在多少星吗？81万。不是8.1万，是81万。这个数字意味着什么？意味着全球有几十万开发者和AI玩家已经在用了，你还不知道？ 🔗 : https://t.co/7oTbpRfg0o 先说这东西是干嘛的。简单讲，就是一个超级角色提示词库，里面塞了140多个专家级Agent角色。CEO、律师、程序员、产品经理、增长黑客、财务顾问、市场策略师……你能叫出名字的岗位，基本都在里面。每个角色都是精心调教过的，不是那种随便写两句“你是一个律师”就完事的垃圾提示词，是真的有深度、有专业框架的角色设定。用法也极其简单，三步走： 1️⃣ 打开这个库，找到你需要的角色 2️⃣ 把角色描述直接喂给Claude Code或者Cursor 3️⃣ 开始对话，你的AI瞬间变成那个领域的专属顾问就这么简单。你在做一个创业项目，不知道商业模式怎么跑通？调出CEO角色问它。合同条款看不懂？调出律师角色让它帮你拆解。代码写不下去？调出高级程序员角色让它review。但最骚的玩法不是单个角色，是多Agent组团。你可以同时开几个对话窗口，一个跑CEO视角，一个跑市场总监视角，一个跑财务视角，让他们对同一个问题给出不同角度的判断。这不就是给自己开了一家虚拟公司吗？而且这帮“员工”7×24小时在线，不要工资，不会摸鱼，不会跟你讲情绪。我自己测试了几个角色，说实话，质量比我预期高很多。特别是那些商业类角色，给出的框架和思路真的有点东西，不是那种泛泛而谈的废话，是能直接落地用的建议。当然，AI终究是AI，专业判断还是要自己把关，但作为思路启发和初稿生成，这个库的价值真的被严重低估了。还有一点我必须说——它是完全免费的。开源、免费、随便用、随便改。在这个到处收费、动不动就订阅的AI时代，这种东西真的越来越少见了。现在AI工具的核心竞争力已经不是模型本身了，Claude、GPT、Gemini底层能力差距在缩小，真正拉开差距的是你怎么用、用什么角色、给它什么上下文。这个库本质上是在帮你解决“怎么用”的问题，把最难的提示词工程部分给你做好了，你直接拿来用就行。 81万星不是运气，是实力。还没收藏的，现在去。

XAMTO_AI's tweet photo. 兄弟们，今天随手一刷，看到个东西直接把我干懵了。

GitHub上有个叫 agency-agents 的开源角色库，你知道它现在多少星吗？81万。不是8.1万，是81万。这个数字意味着什么？意味着全球有几十万开发者和AI玩家已经在用了，你还不知道？

🔗 : https://t.co/7oTbpRfg0o

先说这东西是干嘛的。

简单讲，就是一个超级角色提示词库，里面塞了140多个专家级Agent角色。CEO、律师、程序员、产品经理、增长黑客、财务顾问、市场策略师……你能叫出名字的岗位，基本都在里面。每个角色都是精心调教过的，不是那种随便写两句“你是一个律师”就完事的垃圾提示词，是真的有深度、有专业框架的角色设定。

用法也极其简单，三步走：

1️⃣ 打开这个库，找到你需要的角色
2️⃣ 把角色描述直接喂给Claude Code或者Cursor
3️⃣ 开始对话，你的AI瞬间变成那个领域的专属顾问
就这么简单。你在做一个创业项目，不知道商业模式怎么跑通？调出CEO角色问它。合同条款看不懂？调出律师角色让它帮你拆解。代码写不下去？调出高级程序员角色让它review。
但最骚的玩法不是单个角色，是多Agent组团。

你可以同时开几个对话窗口，一个跑CEO视角，一个跑市场总监视角，一个跑财务视角，让他们对同一个问题给出不同角度的判断。这不就是给自己开了一家虚拟公司吗？而且这帮“员工”7×24小时在线，不要工资，不会摸鱼，不会跟你讲情绪。

我自己测试了几个角色，说实话，质量比我预期高很多。

特别是那些商业类角色，给出的框架和思路真的有点东西，不是那种泛泛而谈的废话，是能直接落地用的建议。当然，AI终究是AI，专业判断还是要自己把关，但作为思路启发和初稿生成，这个库的价值真的被严重低估了。

还有一点我必须说——它是完全免费的。

开源、免费、随便用、随便改。在这个到处收费、动不动就订阅的AI时代，这种东西真的越来越少见了。

现在AI工具的核心竞争力已经不是模型本身了，Claude、GPT、Gemini底层能力差距在缩小，真正拉开差距的是你怎么用、用什么角色、给它什么上下文。这个库本质上是在帮你解决“怎么用”的问题，把最难的提示词工程部分给你做好了，你直接拿来用就行。

81万星不是运气，是实力。

还没收藏的，现在去。

223

460K

hejohnnyu retweeted

Amto

@XAMTO_AI

17 days ago

开源 TTS 卷成这样，某些群体怕是要高兴坏了。清华 OpenBMB 刚放出一个叫 VoxCPM2 的东西，我看完确实沉默了。参数 2B，训练数据 200 万小时多语言音频，输出 48kHz 录音棚级音质。这几个数字摆出来，传统 TTS 基本可以退场了。但最让人警觉的不是这个。它不用 Tokenizer。传统方案是把音频切成离散 token 再生成，这个过程信息损失很严重，声音听起来总差一点意思。VoxCPM2 直接在连续潜空间做扩散自回归，音色、情绪、呼吸节奏，全给你保留下来。指标给你列一下： ① 支持 30 种语言加 9 种中文方言，普通话、粤语、闽南语随便切 ② RTX 4090 跑下来实时率 0.13，流式输出几乎感觉不到延迟 ③ 不需要参考音频，用自然语言描述就能直接生成声音 ④ 声音克隆可以调情绪、语速、口癖，想让它说话磕巴都行 ⑤ 终极克隆模式：给一段参考音频加文本，连呼吸节奏都能复刻出来协议是 Apache 2.0，商用友好，GitHub 已经破万星，连续霸榜 Trending。播客、有声书、游戏配音、短视频旁白，开源方案现在完全够用，甚至比很多付费方案更强。说实话这东西是双刃剑。一边是创作者门槛彻底拉平，另一边是诈骗工具又多了一把更锋利的刀。声音这东西，以后真的不能随便信了。 🔗 https://t.co/fqvO78bhf1

XAMTO_AI's tweet photo. 开源 TTS 卷成这样，某些群体怕是要高兴坏了。

清华 OpenBMB 刚放出一个叫 VoxCPM2 的东西，我看完确实沉默了。

参数 2B，训练数据 200 万小时多语言音频，输出 48kHz 录音棚级音质。这几个数字摆出来，传统 TTS 基本可以退场了。

但最让人警觉的不是这个。

它不用 Tokenizer。

传统方案是把音频切成离散 token 再生成，这个过程信息损失很严重，声音听起来总差一点意思。VoxCPM2 直接在连续潜空间做扩散自回归，音色、情绪、呼吸节奏，全给你保留下来。

指标给你列一下：

① 支持 30 种语言加 9 种中文方言，普通话、粤语、闽南语随便切
② RTX 4090 跑下来实时率 0.13，流式输出几乎感觉不到延迟
③ 不需要参考音频，用自然语言描述就能直接生成声音
④ 声音克隆可以调情绪、语速、口癖，想让它说话磕巴都行
⑤ 终极克隆模式：给一段参考音频加文本，连呼吸节奏都能复刻出来

协议是 Apache 2.0，商用友好，GitHub 已经破万星，连续霸榜 Trending。

播客、有声书、游戏配音、短视频旁白，开源方案现在完全够用，甚至比很多付费方案更强。

说实话这东西是双刃剑。一边是创作者门槛彻底拉平，另一边是诈骗工具又多了一把更锋利的刀。

声音这东西，以后真的不能随便信了。

🔗 https://t.co/fqvO78bhf1

234

89K

hejohnnyu retweeted

Muratcan Koylan

@koylanai

18 days ago

'Agent Harness Engineering: A Survey' just cited my Agent Skills for Context Engineering project in its Context & Memory Management section. It’s a new paper on OpenReview (authors from CMU, Yale, Johns Hopkins, Amazon + others). They reviewed 170+ open-source projects and pulled real production lessons from OpenAI, Anthropic, and LangChain. Agent performance in the real world = Model capability + Harness quality For long-horizon, multi-step, production tasks, the harness has become the main bottleneck. Simple harness tweaks (better tool formats, sandbox changes, automated verification loops) deliver significant gains on benchmarks. This is the second time my open-source work has been cited in academic research (first was Peking University’s State Key Lab paper on meta context engineering). I’m genuinely proud of that, but more than anything it reminds me why I love open source. I’m not from academia. I learned this field by building, shipping, writing... Open source lets your experiments enter the research papers. That is still one of the best parts of this field. The paper is worth reading. We're moving from “build one agent” to “operate a fleet of long-running agents” and the paper repeatedly shows that the biggest improvements come from turning production traces into regression tests and automated harness fixes. Paper & Repo: https://t.co/PAjqvOXedL

koylanai's tweet photo. 'Agent Harness Engineering: A Survey' just cited my Agent Skills for Context Engineering project in its Context & Memory Management section.

It’s a new paper on OpenReview (authors from CMU, Yale, Johns Hopkins, Amazon + others). They reviewed 170+ open-source projects and pulled real production lessons from OpenAI, Anthropic, and LangChain.

Agent performance in the real world = Model capability + Harness quality

For long-horizon, multi-step, production tasks, the harness has become the main bottleneck. Simple harness tweaks (better tool formats, sandbox changes, automated verification loops) deliver significant gains on benchmarks.

This is the second time my open-source work has been cited in academic research (first was Peking University’s State Key Lab paper on meta context engineering).

I’m genuinely proud of that, but more than anything it reminds me why I love open source. I’m not from academia. I learned this field by building, shipping, writing...

Open source lets your experiments enter the research papers. That is still one of the best parts of this field.

The paper is worth reading. We're moving from “build one agent” to “operate a fleet of long-running agents” and the paper repeatedly shows that the biggest improvements come from turning production traces into regression tests and automated harness fixes.

Paper & Repo: https://t.co/PAjqvOXedL

718

145

836

39K

hejohnnyu retweeted

GOLD

@Honcia13

17 days ago

科研的门槛，正在被彻底重新定义！以前做研究：熬夜刷论文、反复跑代码、写一周综述。现在：一句话指令就够了。开源AI代理 Feynman 把博士级研究流程压缩成全自动执行：一句话就能完成 arXiv 深度调研、文献综述、代码验证四大智能体协同（Researcher + Reviewer + Writer + Verifier），几乎零幻觉生成带完整引用、审稿级质量的研究简报支持论文审计、代码复现、主题持续追踪本地优先，完全开源，支持 Ollama 等本地模型，数据不离电脑。不管你是研究员、开发者还是学生党，这工具都能把重复劳动甩给AI，让你专注真正有创造力的部分。科研效率直接起飞的时代来了！ https://t.co/N5vJYBPhu5

Honcia13's tweet photo. 科研的门槛，正在被彻底重新定义！

以前做研究：
熬夜刷论文、反复跑代码、写一周综述。

现在：
一句话指令就够了。
开源AI代理 Feynman 把博士级研究流程压缩成全自动执行：
一句话就能完成 arXiv 深度调研、文献综述、代码验证
四大智能体协同（Researcher + Reviewer + Writer + Verifier），几乎零幻觉
生成带完整引用、审稿级质量的研究简报
支持论文审计、代码复现、主题持续追踪

本地优先，完全开源，支持 Ollama 等本地模型，数据不离电脑。
不管你是研究员、开发者还是学生党，
这工具都能把重复劳动甩给AI，让你专注真正有创造力的部分。
科研效率直接起飞的时代来了！
https://t.co/N5vJYBPhu5

798

176

867

67K

hejohnnyu retweeted

Xudong Han

@Xudong07452910

19 days ago

AutoResearch AI 这论文挺值得看的。它讲的不是“AI 帮你总结论文”这种单点能力，而是一个更大的趋势：科研正在从 task-level AI，走向 workflow-level AI。也就是说，AI 以后不只是帮你查文献、写代码、润色论文，而是可能参与完整科研流程：读文献、找问题、提假设、设计实验、调用工具跑实验、验证结果、写报告、再根据反馈修改。论文里有个概念叫 Vibe Research，我觉得很形象：现在很多科研人其实已经在做了。人类给方向，AI 帮忙查、写、跑、改，最后人类负责判断和验证。但作者也很清醒：真正的 AI 科学家还没到来。当前系统最大的问题不是会不会生成想法，而是证据能不能保存、实验能不能复现、弱方向能不能被及时拒绝、结论能不能追溯来源。我觉得这篇文章最大的启发是：未来科研能力的竞争，可能不只是“谁会用 AI 写论文”，而是谁能搭出一套可靠的 AI research workflow。 AI for Science 的下一步，不是更会聊天的科研助手，而是更可验证、更可复现、更能闭环的科研工作流。 https://t.co/prnPUiBckS #AIforScience #AutoResearch #Codex #claudecode

Xudong07452910's tweet photo. AutoResearch AI 这论文挺值得看的。

它讲的不是“AI 帮你总结论文”这种单点能力，而是一个更大的趋势：科研正在从 task-level AI，走向 workflow-level AI。

也就是说，AI 以后不只是帮你查文献、写代码、润色论文，而是可能参与完整科研流程：读文献、找问题、提假设、设计实验、调用工具跑实验、验证结果、写报告、再根据反馈修改。

论文里有个概念叫 Vibe Research，我觉得很形象：现在很多科研人其实已经在做了。人类给方向，AI 帮忙查、写、跑、改，最后人类负责判断和验证。

但作者也很清醒：真正的 AI 科学家还没到来。当前系统最大的问题不是会不会生成想法，而是证据能不能保存、实验能不能复现、弱方向能不能被及时拒绝、结论能不能追溯来源。

我觉得这篇文章最大的启发是：未来科研能力的竞争，可能不只是“谁会用 AI 写论文”，而是谁能搭出一套可靠的 AI research workflow。

AI for Science 的下一步，不是更会聊天的科研助手，而是更可验证、更可复现、更能闭环的科研工作流。

https://t.co/prnPUiBckS
#AIforScience #AutoResearch #Codex #claudecode

526

116

457

31K

hejohnnyu retweeted

淘沙者(TheSandPicker)

@Etudecn

20 days ago

这段马斯克 2003 年在斯坦福的 45 分钟闭门演讲，真的有点东西。不是那种空话连篇的创业鸡汤。而是他亲自把“怎么从 0 搞出一家公司”一层层拆给你看。更狠的是，他没在讲概念，直接把自己当时手里那三家公司，到底怎么活下来的，整个过程都摊开了。这种内容放到今天看都不过时。很多人听完的反应都很统一：这 45 分钟，顶得上你翻十本创业书。那种真正能用的干货，往往就藏在这种老素材里，不吵不闹，但句句都是实战。

173

16K

24K

hejohnnyu retweeted

elvis

@omarsar0

20 days ago

New research from Microsoft Research I see a lot of AI engineers handwriting agent skill docs and hope they generalize. Probably not optimal. This works show why. It treats the skill doc as a trainable external state of a frozen agent instead. It introduces SkillOpt, where an optimizer model makes validation-gated edits to the skill file. It adds, deletes, or replaces instructions, with a textual learning rate that controls how aggressively each round rewrites the doc. The agent itself never changes. SkillOpt is best or tied on all 52 (model, benchmark, harness) cells. On GPT-5.5 it adds 23.5 points in direct chat, 24.8 with Codex, and 19.1 with Claude Code over no skill. It beats human-written skills, TextGrad, GEPA, and EvoSkill, carries zero extra inference-time cost, and the learned skills transfer across models and harnesses. Paper: https://t.co/mNgTmmT32U Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. New research from Microsoft Research

I see a lot of AI engineers handwriting agent skill docs and hope they generalize.

Probably not optimal. This works show why.

It treats the skill doc as a trainable external state of a frozen agent instead.

It introduces SkillOpt, where an optimizer model makes validation-gated edits to the skill file. It adds, deletes, or replaces instructions, with a textual learning rate that controls how aggressively each round rewrites the doc. The agent itself never changes.

SkillOpt is best or tied on all 52 (model, benchmark, harness) cells.

On GPT-5.5 it adds 23.5 points in direct chat, 24.8 with Codex, and 19.1 with Claude Code over no skill. It beats human-written skills, TextGrad, GEPA, and EvoSkill, carries zero extra inference-time cost, and the learned skills transfer across models and harnesses.

Paper: https://t.co/mNgTmmT32U

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

197

173K

hejohnnyu retweeted

Roan

@RohOnChain

about 1 month ago

Anthropic pays $750,000+ a year for engineers who can build LLM architectures from scratch. Stanford taught the entire thing in 1 hour lecture & released it for free. Bookmark & watch this today before someone takes it down.

116

10K

21K

hejohnnyu retweeted

Joruno

@wsl8297

2 months ago

在 GitHub 挖到一个电子书资源大本营：EBOOK ETC，爱看书的人很难不顺手收藏。它把各个平台的电子书入口集中整理到一起，微信读书、京东读书、喜马拉雅等 app 里常见的书，基本都能在这里找到线索。 Github 地址：https://t.co/mgff73DEJX 不管你爱经典文学、经管励志、终身学习、职场创业，还是各类技术手册，都能按兴趣快速挖到对的书。更省心的是，很多书直接配齐 epub、mobi、azw3 三种格式，kindle 和各种阅读器都能顺畅打开。仓库还按标签清晰分类：点目录标签直达，或用 ctrl+f 直接搜，找书效率拉满。

wsl8297's tweet photo. 在 GitHub 挖到一个电子书资源大本营：EBOOK ETC，爱看书的人很难不顺手收藏。

它把各个平台的电子书入口集中整理到一起，微信读书、京东读书、喜马拉雅等 app 里常见的书，基本都能在这里找到线索。

Github 地址：https://t.co/mgff73DEJX

不管你爱经典文学、经管励志、终身学习、职场创业，还是各类技术手册，都能按兴趣快速挖到对的书。

更省心的是，很多书直接配齐 epub、mobi、azw3 三种格式，kindle 和各种阅读器都能顺畅打开。

仓库还按标签清晰分类：点目录标签直达，或用 ctrl+f 直接搜，找书效率拉满。

790

230K

hejohnnyu retweeted

Garry Tan

@garrytan

2 months ago

Just launched GBrain v0.8.0 If you have it installed, you can just ask your Claw/Hermes to upgrade to the latest GBrain and we'll automatically ask if you want to install your Voice WebRTC endpoint and Twilio number It's a true mega brain-trip to talk to your agent directly.

garrytan's tweet photo. Just launched GBrain v0.8.0

If you have it installed, you can just ask your Claw/Hermes to upgrade to the latest GBrain and we'll automatically ask if you want to install your Voice WebRTC endpoint and Twilio number

It's a true mega brain-trip to talk to your agent directly. https://t.co/EuxWIK07V4

760

711

121K

hejohnnyu retweeted

AYi

@AYi_AInotes

2 months ago

说实话，今天看到这个，我直接把手里所有其他AI记忆方案全停了🤩🤩🤩 YC总裁Garry Tan，把自己天天在用的生产级AI Agent记忆系统，完整开源了这是他自己跑了很久的真实配置，管着10000+Markdown文件，3000+人物档案，13年的日历数据，5800条苹果笔记，还有所有的会议记录、原创想法现在他把这套东西打包成了GBrain，MIT协议，所有人都可以免费抄作业 github 地址老规矩评论区自取👇

AYi_AInotes's tweet photo. 说实话，今天看到这个，我直接把手里所有其他AI记忆方案全停了🤩🤩🤩

YC总裁Garry Tan，把自己天天在用的生产级AI Agent记忆系统，完整开源了

这是他自己跑了很久的真实配置，管着10000+Markdown文件，3000+人物档案，13年的日历数据，5800条苹果笔记，还有所有的会议记录、原创想法

现在他把这套东西打包成了GBrain，MIT协议，所有人都可以免费抄作业

github 地址老规矩评论区自取👇

551

395K

2 months ago

hejohnnyu retweeted

2 months ago

你的下一个员工，何必是同事。同事.skill这几天火了，把同事蒸馏成AI Skill。这个项目证明了一件事：蒸馏一个人的能力，是可行的。但我想问一个问题：既然我们已经有了蒸馏人的能力，为什么要蒸馏身边的同事？去蒸馏各领域最强的人。而且幸运的是，这些人留下了大量可以被蒸馏的材料。芒格有《穷查理宝典》和几十年的股东会演讲。费曼有全套讲义和自传。Naval有几百条推文和长篇播客。塔勒布有五本书和无数公开辩论。这些都是高纯度的思维原矿，等着被提炼。我之前就一直在干这件事。今天把方法论开源了，叫女娲。使用极其简单：输入一个名字就行。女娲自动启动6个Agent去找书、找播客、找推文、找批评者，自己调研、自己提炼、自己验证。你不需要准备任何材料。 6个Agent并行采集著作、播客、社交媒体、批评者视角、决策记录、时间线。然后三重验证：一个观点要被收录为心智模型，必须跨2个以上领域出现过、能预测新问题立场、不是所有聪明人都会这么想。三条全过才留。决策前问芒格，写教程问费曼，评估风险问塔勒布。你想蒸馏谁就蒸馏谁。 GitHub: https://t.co/MNTlj88uww 安装: `npx skills add alchaincyf/nuwa-skill` MIT协议，随便用，随便改，随便造。

AlchainHust's tweet photo. 你的下一个员工，何必是同事。

同事.skill这几天火了，把同事蒸馏成AI Skill。这个项目证明了一件事：蒸馏一个人的能力，是可行的。

但我想问一个问题：既然我们已经有了蒸馏人的能力，为什么要蒸馏身边的同事？

去蒸馏各领域最强的人。

而且幸运的是，这些人留下了大量可以被蒸馏的材料。芒格有《穷查理宝典》和几十年的股东会演讲。费曼有全套讲义和自传。Naval有几百条推文和长篇播客。塔勒布有五本书和无数公开辩论。这些都是高纯度的思维原矿，等着被提炼。

我之前就一直在干这件事。今天把方法论开源了，叫女娲。

使用极其简单：输入一个名字就行。女娲自动启动6个Agent去找书、找播客、找推文、找批评者，自己调研、自己提炼、自己验证。你不需要准备任何材料。

6个Agent并行采集著作、播客、社交媒体、批评者视角、决策记录、时间线。然后三重验证：一个观点要被收录为心智模型，必须跨2个以上领域出现过、能预测新问题立场、不是所有聪明人都会这么想。三条全过才留。

决策前问芒格，写教程问费曼，评估风险问塔勒布。你想蒸馏谁就蒸馏谁。

GitHub: https://t.co/MNTlj88uww
安装: `npx skills add alchaincyf/nuwa-skill`

MIT协议，随便用，随便改，随便造。

394

544K

hejohnnyu retweeted

Ivan Burazin

@ivanburazin

2 months ago

After the Claude Code source code leak, a former PM extracted its multi-agent orchestration system into an open source model agnostic framework. He studied the architecture, focused on the multi-agent orchestration layer (the coordinator that breaks goals into tasks, team system, message bus, task scheduler with dependency resolution), and reimplemented these patterns from scratch as a standalone open source framework without infringing on Anthropic's code. The result is what @JackChen_x calls an "open-multi-agent." Unlike claude-agent-sdk, which spawns a CLI process per agent, this runs entirely in-process and can be deployed anywhere (serverless, Docker, CI/CD) Check it out: https://t.co/w3XjnZEk92

ivanburazin's tweet photo. After the Claude Code source code leak, a former PM extracted its multi-agent orchestration system into an open source model agnostic framework.

He studied the architecture, focused on the multi-agent orchestration layer (the coordinator that breaks goals into tasks, team system, message bus, task scheduler with dependency resolution), and reimplemented these patterns from scratch as a standalone open source framework without infringing on Anthropic's code.

The result is what @JackChen_x calls an "open-multi-agent." Unlike claude-agent-sdk, which spawns a CLI process per agent, this runs entirely in-process and can be deployed anywhere (serverless, Docker, CI/CD)

Check it out: https://t.co/w3XjnZEk92

103

547

552K

hejohnnyu retweeted

0xMarioNawfal

@RoundtableSpace

2 months ago

Send this article to your agent and thank me later https://t.co/mHc2hDCYEK

252

847K

hejohnnyu retweeted

Andrej Karpathy

@karpathy

2 months ago

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

60K

107K

21M

hejohnnyu retweeted

Turing Post

@TheTuringPost

3 months ago

Must-read AI research of the week: ▪️ OpenClaw-RL ▪️ Meta-Reinforcement Learning with Self-Reflection for Agentic Search ▪️ Agentic Critical Training ▪️ Video-Based Reward Modeling for Computer-Use Agents ▪️ AutoResearch-RL ▪️ Neural Thickets ▪️ Training Language Models via Neural Cellular Automata ▪️ The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training ▪️ Lost in Backpropagation: The LM Head is a Gradient Bottleneck ▪️ IndexCache ▪️ Attention Residuals ▪️ REMIX: Reinforcement Routing for Mixtures of LoRAs in LLM Finetuning ▪️ Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections ▪️ Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs ▪️ How Far Can Unsupervised RLVR Scale LLM Training? ▪️ Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training ▪️ Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs ▪️ Scale Space Diffusion Find the full list and the main AI news and updates from NVIDIA GTC here: https://t.co/T985DbaCvR

TheTuringPost's tweet photo. Must-read AI research of the week:

▪️ OpenClaw-RL
▪️ Meta-Reinforcement Learning with Self-Reflection for Agentic Search
▪️ Agentic Critical Training
▪️ Video-Based Reward Modeling for Computer-Use Agents
▪️ AutoResearch-RL
▪️ Neural Thickets
▪️ Training Language Models via Neural Cellular Automata
▪️ The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
▪️ Lost in Backpropagation: The LM Head is a Gradient Bottleneck
▪️ IndexCache
▪️ Attention Residuals
▪️ REMIX: Reinforcement Routing for Mixtures of LoRAs in LLM Finetuning
▪️ Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
▪️ Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
▪️ How Far Can Unsupervised RLVR Scale LLM Training?
▪️ Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training
▪️ Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
▪️ Scale Space Diffusion

Find the full list and the main AI news and updates from NVIDIA GTC here: https://t.co/T985DbaCvR

553

118

517

29K

hejohnnyu retweeted

elvis

@omarsar0

11 months ago

A Survey of Context Engineering 160+ pages covering the most important research around context engineering for LLMs. This is a must-read! Here are my notes:

omarsar0's tweet photo. A Survey of Context Engineering

160+ pages covering the most important research around context engineering for LLMs.

This is a must-read!

Here are my notes: https://t.co/e85As8o7a9

315

204K

Johnny He

@hejohnnyu

Last Seen Users on Sotwe

Trends for you

Most Popular Users