pandao

@ipandao

Xiamen, China

Joined April 2013

641 Following

60 Followers

36 Posts

ipandao retweeted

Cloudflare @Cloudflare

29 days ago

VoidZero, the team behind Vite, Vitest, Rolldown, Oxc, and Vite+, is joining Cloudflare. Vite stays open source, vendor-agnostic, and built for everyone. https://t.co/DJTpX4Q9Xt

375

319

645K

ipandao retweeted

Evan You

@evanyou

29 days ago

I've left most of what I want to say in the VoidZero blog post. But worth repeating: Thank you @voidzerodev team for trusting me and joining me on this wild ride. I am very proud to have assembled such a talented team and even prouder of what we have built together. Thank you all our investors for believing in my vision, in particular @caseyaylward from @Accel who led both our Seed and Series A. Thank you the @vite_js community. Vite and VoidZero wouldn’t have come this far without your trust and support. We will continue building with all of you, together, in the open. And thank you to everyone that made this happen at @Cloudflare. Looking forward to working with you all! https://t.co/0ly53VCOSr

147

188

197

278K

ipandao retweeted

VoidZero

@voidzerodev

29 days ago

VoidZero is joining Cloudflare. Our mission stays the same: to make JavaScript developers more productive than ever before. Vite, Vitest, Rolldown, Oxc, and Vite+ remain MIT-licensed. Evan and the VoidZero team will continue leading them. Cloudflare shares our commitment to open source. Together, we can keep investing in the tooling developers rely on every day, while bringing the Vite ecosystem and Cloudflare’s platform even closer together.

voidzerodev's tweet photo. VoidZero is joining Cloudflare.

Our mission stays the same: to make JavaScript developers more productive than ever before. Vite, Vitest, Rolldown, Oxc, and Vite+ remain MIT-licensed. Evan and the VoidZero team will continue leading them.

Cloudflare shares our commitment to open source. Together, we can keep investing in the tooling developers rely on every day, while bringing the Vite ecosystem and Cloudflare’s platform even closer together.

221

720

487

841K

ipandao retweeted

Hunter Bown

@goodhunt

2 months ago

鲸鱼兄弟们好，我是做 DeepSeek-TUI 的那个美国佬。说真的，特别想跟国内的鲸鱼兄弟们一起混——但我的翻墙技能仅限于写代码，微信到现在都没搞定，属实有点丢人。求各位大佬帮个忙： 1）帮忙转发扩散一下，让这个开源终端工具翻过高墙被兄弟们看到 2）顺手帮我验证个微信号，我想建个群，大家一起聊 DeepSeek、聊开源、聊怎么把 agent 做得更好作为交换，我发誓死守 cargo install 这条安装路径，绝不让任何一个兄弟受 npm 的苦。顺带一提，这段话是 DeepSeek 帮我润色的——感谢鲸鱼赐我流利中文 🙏 https://t.co/fnO73VB5gs

941

641

Who to follow

Ben

@bklein01

I am a husband, a father, a sorcerer of software, a prognosticator of projects, and deliverer of data Always wanting to make a difference in whatever I do.

ipandao retweeted

2 months ago

Prompt injection tops the OWASP LLM Top 10 and there's no single fix. Instead, you stack defenses, each one catching what the others miss. Defenses come in two families: model-level and system-level. Model-level defenses teach the model to resist injection. - Spotlighting wraps untrusted text in control tags like <UNTRUSTED>…</UNTRUSTED> and tells the model to treat anything inside as data, not instructions. - Instruction Hierarchy fine-tunes the model to rank the developer's system prompt above the user's message, and both above third-party content. System-level defenses build a system around the LLM that bounds the damage. - Least-Privilege Tools: Give the agent the minimum tools it needs. - Human-in-the-Loop: Require explicit user approval before any sensitive action runs. - Planner / Executor Split: Two separate LLMs. The planner has tool access but never sees untrusted content. The executor reads untrusted content but has no tools. No single defense is enough. Production systems like Gmail stack them, and together they make indirect injection manageable. Over to you: what's the one defense you've seen work in production that isn't on this list?

alexxubyte's tweet photo. Prompt injection tops the OWASP LLM Top 10 and there's no single fix.

Instead, you stack defenses, each one catching what the others miss.

Defenses come in two families: model-level and system-level.

Model-level defenses teach the model to resist injection.
- Spotlighting wraps untrusted text in control tags like <UNTRUSTED>…</UNTRUSTED> and tells the model to treat anything inside as data, not instructions.
- Instruction Hierarchy fine-tunes the model to rank the developer's system prompt above the user's message, and both above third-party content.

System-level defenses build a system around the LLM that bounds the damage.
- Least-Privilege Tools: Give the agent the minimum tools it needs.
- Human-in-the-Loop: Require explicit user approval before any sensitive action runs.

- Planner / Executor Split: Two separate LLMs. The planner has tool access but never sees untrusted content. The executor reads untrusted content but has no tools.

No single defense is enough. Production systems like Gmail stack them, and together they make indirect injection manageable.

Over to you: what's the one defense you've seen work in production that isn't on this list?

335

267

22K

ipandao retweeted

Graeme

@gkisokay

2 months ago

The Local LLM Cheat Sheet for 512GB RAM Have you ever wondered which top models run on a serious AI rig or the largest Mac Studio M3? Size is important, but it's really how you use it. As you can see from the list, a few models are punching above their weight. The Top 8 Best Frontier / Daily Models GLM-5.1 - The Best Daily Generalist A strong open-weight “frontier-style” all-rounder for chat, research, tool use, complex agents, and long-context assistant work. At roughly 435.97GB, it fits the 512GB class while still leaving practical room for KV. DeepSeek-V4-Flash - The Best Frontier Reasoning DeepSeek-V4-Pro is the real monster, but at 806GB, it does not fit in this class. V4-Flash gives you the in-budget reasoning alternative for math, logic, code reasoning, and complex CoT-style workloads. MiniMax-M2.7 - The Best Agentic and Tool-Use Built for persistent agent loops, long sessions, function calling, and multi-turn workflows. If your local setup is running Cline-style, Aider-style, or tool-heavy agent loops, this is one of the most interesting 512GB-class picks. Qwen3-Coder-480B-A35B-Instruct - The Best Dedicated Coder Great for code completion, agentic coding, refactoring, and SWE-style tasks. Qwen3-VL-235B-A22B-Thinking - The Best Vision + Reasoning Use it for image Q&A, OCR, screenshot analysis, chart reasoning, and vision-CoT workflows. The key point is that it fits the 512GB class while keeping vision reasoning strong. Kimi-K2.5 - The Best Long-Context Specialist Ideal for huge documents, RAG at scale, thousand-page synthesis, and multi-doc reasoning. This is the pick when the real bottleneck is not raw reasoning, but holding a massive amount of context together coherently. Mistral Large 3 675B - The Largest Dense Model It is slower, but dense models can be extremely consistent for long-form generation, translation, complex synthesis, and prose, where routing variance is not desirable. Pick this when consistency matters more than speed. Qwen3.6-27B - The Compact Workhorse At about 50GB BF16, it leaves a huge amount of RAM free and makes sense as the fast local daily driver. Great for low-latency local work, fast iteration, multi-session use, and pairing with a larger model. Important note: this is not a parameter-count ranking. A 50GB dense model can sit alongside a 447GB model if it has a workflow the larger model lacks. The right question is what job does this model do better than anything else that fits. Which local models are you actually using on your 512GB setup right now?

gkisokay's tweet photo. The Local LLM Cheat Sheet for 512GB RAM

Have you ever wondered which top models run on a serious AI rig or the largest Mac Studio M3?

Size is important, but it's really how you use it. As you can see from the list, a few models are punching above their weight.

The Top 8 Best Frontier / Daily Models

GLM-5.1 - The Best Daily Generalist
A strong open-weight “frontier-style” all-rounder for chat, research, tool use, complex agents, and long-context assistant work. At roughly 435.97GB, it fits the 512GB class while still leaving practical room for KV.

DeepSeek-V4-Flash - The Best Frontier Reasoning
DeepSeek-V4-Pro is the real monster, but at 806GB, it does not fit in this class. V4-Flash gives you the in-budget reasoning alternative for math, logic, code reasoning, and complex CoT-style workloads.

MiniMax-M2.7 - The Best Agentic and Tool-Use
Built for persistent agent loops, long sessions, function calling, and multi-turn workflows. If your local setup is running Cline-style, Aider-style, or tool-heavy agent loops, this is one of the most interesting 512GB-class picks.

Qwen3-Coder-480B-A35B-Instruct - The Best Dedicated Coder
Great for code completion, agentic coding, refactoring, and SWE-style tasks.

Qwen3-VL-235B-A22B-Thinking - The Best Vision + Reasoning
Use it for image Q&A, OCR, screenshot analysis, chart reasoning, and vision-CoT workflows. The key point is that it fits the 512GB class while keeping vision reasoning strong.

Kimi-K2.5 - The Best Long-Context Specialist
Ideal for huge documents, RAG at scale, thousand-page synthesis, and multi-doc reasoning. This is the pick when the real bottleneck is not raw reasoning, but holding a massive amount of context together coherently.

Mistral Large 3 675B - The Largest Dense Model
It is slower, but dense models can be extremely consistent for long-form generation, translation, complex synthesis, and prose, where routing variance is not desirable. Pick this when consistency matters more than speed.

Qwen3.6-27B - The Compact Workhorse
At about 50GB BF16, it leaves a huge amount of RAM free and makes sense as the fast local daily driver. Great for low-latency local work, fast iteration, multi-session use, and pairing with a larger model.

Important note: this is not a parameter-count ranking. A 50GB dense model can sit alongside a 447GB model if it has a workflow the larger model lacks. The right question is what job does this model do better than anything else that fits.

Which local models are you actually using on your 512GB setup right now?

ipandao retweeted

Vals AI

@ValsAI

2 months ago

Qwen 3.6 27B just hit the Vals Index, landing #8/18 among open source models. It packs a punch for its size, and performs similarly to Qwen 3.6 Plus, despite, presumably, being significantly smaller.

ValsAI's tweet photo. Qwen 3.6 27B just hit the Vals Index, landing #8/18 among open source models.

It packs a punch for its size, and performs similarly to Qwen 3.6 Plus, despite, presumably, being significantly smaller. https://t.co/bOedqoHpkK

298

18K

ipandao retweeted

DeepSeek

@deepseek_ai

2 months ago

The DeepSeek-V4-Pro discount has been extended until May 31, 2026, 15:59 UTC!

315

572

955

ipandao retweeted

Deli Chen

@victor207755822

2 months ago

Come try out the incredible work from our genius multimodal colleagues! 🐳👀 The little whale can now see (in grayscale testing)~ ✨

victor207755822's tweet photo. Come try out the incredible work from our genius multimodal colleagues! 🐳👀 The little whale can now see (in grayscale testing)~ ✨ https://t.co/qfdDQMCnfc

956

72K

ipandao retweeted

宝玉

@dotey

2 months ago

转译：深度拆解 Hermes Agent 的记忆系统：它如何修正 OpenClaw 的误区如果你读过我之前关于 ChatGPT、Claude 以及 Clawdbot 记忆系统的文章，你就会知道我一直在钻研同一个问题：这些 AI 智能体（AI Agent）到底是怎么记事的？ Hermes Agent 对我来说格外有趣，因为这次我不需要只靠观察它的行为来��“逆向工程”。Hermes 是开源的，它的代码库和文档都是公开的。所以，我没有通过提示词（Prompt）去盲测这个黑盒，而是直接翻看了它的代码路径——从它如何构建提示词状态、持久化会话，到如何清理记忆和查询历史对话。简而言之：Hermes 拥有的不是一套记忆系统，而是四套。 1. 存储在 MEMORY.md 和 USER.md 中、经过高度浓缩的提示词记忆。 2. 通过 session_search 调用的 SQLite 历史会话存档（可搜索）。 3. 像程序记忆（Procedural Memory）一样运作的智能体技能管理。 4. 可选的 Honcho 层，用于更深层的用户建模（User Modeling）。把这些设计联系在一起的核心逻辑非常简单：保持提示词稳定以便利用缓存（Caching），其他一切繁杂信息都交给工具。让我们深入聊聊。 Hermes 的上下文结构在理解记忆之前，我们先看看 Hermes 到底给模型发送了什么。系统提示词（System Prompt）大致是按以下顺序组装的： ------- [0] 默认智能体身份 [1] 工具使用行为指南 [2] Honcho 集成模块（可选） [3] 可选系统消息 [4] 固化的 MEMORY.md 快照 [5] 固化的 USER.md 快照 [6] 技能索引 [7] 上下文文件（AGENTS.md, SOUL.md 等规则文件） [8] 日期/时间 + 平台信息 [9] 对话历史 [10] 当前用户消息 -------- 这非常关键，因为 Hermes 正在针对大模型供应商的提示词缓存（Prompt Caching）机制进行优化。代码显示，提示词构建器的目标非常明确：��稳定的前缀部分尽可能长时间地保持不变。这一个决定就解释了 Hermes 大部分的记忆架构。如果某条信息每一轮对话都要用到，Hermes 会尽量把它缩得很小并注入进去；如果信息量很大、属于历史旧账或者偶尔才有用，Hermes 就会把它踢出提示词，改用“按需检索”的方式。第一层：固化的提示词记忆其内置的记忆系统小得令人惊讶。 Hermes 将持久记忆存储在 ~/.hermes/memories/ 下的两个文件中： 1). MEMORY.md 智能体笔记：环境、规范、工具怪癖、教训限制：2,200 字符 2). USER.md 用户画像：偏好、沟通风格、身份信息限制：1,375 字符这容量真不大。加起来大约只有 1,300 个 Token（模型理解文本的最小单位）。而这正是刻意为之。在会话开始时，Hermes 加载这两个文件，把它们渲染进提示词区块，然后在整个会话期间固化这个快照。会话中途写入的记忆会立即存入硬盘，但不会改变已经生成的系统提示词。这些改动只有在开启新会话，或者触发了“压缩（Compression）”导致的提示词重建时才会生效。渲染后的格式如下： ------ ═══════════ MEMORY (你的个人笔记) [67% — 1,474/2,200 字符] ═══════════ 用户的项目是一个位于 ~/code/myapi 的 Rust Web 服务，使用 Axum + SQLx § 这台机器运行 Ubuntu 22.04，安装了 Docker 和 Podman § 用户喜欢简洁的回复，讨厌冗长的解释 ------ 这里有几个我非常欣赏的细节设计： 1. 使用字符限制而非 Token 限制：这让记忆逻辑与模型无关。Hermes 不需要调用特定模型的计算工具就能判断记忆是否存满。 2. 简单的分隔符文件格式：条目之间用 § 分隔。没有复杂的向量数据库（Vector DB），没有自定义二进制存储，就是纯文本。 3. 刻意保持极小的系统提示词空间：这是整个设计的重中之重。Hermes 不想把所有历史都塞进提示词，它只想要最有价值的事实。 4. 记忆是“精选状态”，而不是“日记”：这是 Hermes 与 OpenClaw 最大的区别。 OpenClaw 的日志更像是“流水账”。而 Hermes 则反其道而行。它的工具架构和测试逻辑强调： • 保存用户偏好。 • 保存环境事实。 • 保存反复出现的错误修正。 • 保存稳定的规范。 • 不保存任务进度。 • 不保存会话结果。 • 不保存临时的待办事项（TODO）。真相是：Hermes 希望 MEMORY.md 和 USER.md 保持精简、高频且对缓存友好。 memory 工具 Hermes 通过一个拥有三种操作的 memory 工具来管理这些文件：add（添加）、replace（替换）、remove（移除）。一个好用的细节是：replace 和 remove 使用子字符串匹配。你不需要记住条目的内部 ID，只需要传入现有条目中一段唯一的文字即可。此外，系统会拒绝完全重复的内容，并拦截危险信息。源代码会扫描记忆条目，防止提示词注入（Prompt Injection，即通过输入恶意指令误导 AI）、凭证泄露或隐藏的 Unicode 字符。第二层：用于情景回溯的 session_search 如果说 MEMORY.md 是 Hermes 的“短期热记忆”，那么 session_search 就是它的“长尾回溯系统”。所有过去的会话都存储在 SQLite 数据库中，拥有完整的索引和搜索功能。当模型需要想起以前聊过的内容时，它不去翻 MEMORY.md，而是搜索这个会话数据库。其工作流程是： 1. 在过去的消息中进行全文搜索。 2. 按会话分组结果。 3. 加载匹配度最高的会话。 4. 使用一个便宜的辅助模型对这些会话进行摘要总结。 5. 将精炼后的回顾内容返回给主模型。这是一种非常务实的设计。它比盲目地把长篇累牍的历史塞进每一个提示词要便宜且高效得多。第三层：压缩与记忆冲刷（Memory Flush） Hermes 另一个聪明之处在于它处理长对话“压缩”的方式。当会话变得太长，Hermes 会压缩对话中间的部分以节省空间。但摘要是有损的，重要事实可能会丢失。于是，Hermes 会先进行一次“记忆冲刷”。在压缩之前，它会发送一条指令告诉模型： > “会话即将压缩，请保存任何值得记住的东西。优先保存用户偏好、修正建议和重复模式，而非具体的任务细节。” 然后它运行一次额外的模型调用，只开启 memory 工具。如果模型觉得有什么东西该留下来，就��在对话��“洗掉”之前把它写入 MEMORY.md。第四层：作为程序记忆的技能（Skills） Hermes 不仅能记住事实，还能记住技能。技能（Skills）存储在 ~/.hermes/skills/ 下。当 Hermes 发现了一个复杂的流程、修复了一个棘手的问题或学会了更好的方法时，它可以将其保存为“技能”。大多数记忆系统只关注“语义回溯”（名字、偏好、事实），但智能体还需要记住如何做事。为了效率，Hermes 不会把所有技能都塞进提示词，而是只放一个技能索引，只有在需要时才加载具体的技能内容。第五层：用于深层建模的 Honcho 最后是可选的 Honcho 层。如果说本地记忆是 Hermes 的笔记本，Honcho 就是它尝试构建的复杂用户模型。它能实现跨设备、跨平台的记忆连续性。最精妙的是它如何在不破坏提示词缓存的前提下实现集成： • 在会话的第一轮，Honcho 的上下文会被织入系统提示词。 • 在之后的对话中，为了保持提示词稳定，Honcho 的回溯内容会附加在当前用户的提问后面，而不是修改系统提示词。这确保了缓存依然有效，同时 AI 依然能读到最新的背景信息。 Hermes 与 OpenClaw 的区别 • OpenClaw：记忆更接近“以 Markdown 为中心的存储”，日志和长效文件是主要事实来源。 • Hermes：提示词记忆被严格限制，历史记录存在 SQLite 里，只有需要时才搜索。 Hermes 更加关注缓存效率。它认为：不是所有东西都配住在“系统提示词”这个黄金地段。总结：Hermes 做对了什么？ 1. 冷热分离：小规模提示词记忆负责常驻信息，搜索负责偶尔用到的信息。 2. 缓存优先：它意识到频繁改动提示词会导致延迟增加和成本上升。 3. 记忆的多样性：它承认记忆是分层的——包括个人画像、情景回溯、操作技能和深层建模。 Hermes 的核心设计原则最令我折服：记忆应该让智能体变得更好用，而不是通过摧毁提示词的稳定性来换取博闻强识。真正的诀窍不是记住更多，而是在正确的层级、以正确的成本，记住正确的事情。

dotey's tweet photo. 转译：深度拆解 Hermes Agent 的记忆系统：它如何修正 OpenClaw 的误区

如果你读过我之前关于 ChatGPT、Claude 以及 Clawdbot 记忆系统的文章，你就会知道我一直在钻研同一个问题：这些 AI 智能体（AI Agent）到底是怎么记事的？

Hermes Agent 对我来说格外有趣，因为这次我不需要只靠观察它的行为来��“逆向工程”。Hermes 是开源的，它的代码库和文档都是公开的。所以，我没有通过提示词（Prompt）去盲测这个黑盒，而是直接翻看了它的代码路径——从它如何构建提示词状态、持久化会话，到如何清理记忆和查询历史对话。

简而言之：Hermes 拥有的不是一套记忆系统，而是四套。

1. 存储在 MEMORY.md 和 USER.md 中、经过高度浓缩的提示词记忆。
2. 通过 session_search 调用的 SQLite 历史会话存档（可搜索）。
3. 像程序记忆（Procedural Memory）一样运作的智能体技能管理。
4. 可选的 Honcho 层，用于更深层的用户建模（User Modeling）。
把这些设计联系在一起的核心逻辑非常简单：保持提示词稳定以便利用缓存（Caching），其他一切繁杂信息都交给工具。

让我们深入聊聊。

Hermes 的上下文结构
在理解记忆之前，我们先看看 Hermes 到底给模型发送了什么。

系统提示词（System Prompt）大致是按以下顺序组装的：
-------
[0] 默认智能体身份
[1] 工具使用行为指南
[2] Honcho 集成模块（可选）
[3] 可选系统消息
[4] 固化的 MEMORY.md 快照
[5] 固化的 USER.md 快照
[6] 技能索引
[7] 上下文文件（AGENTS.md, SOUL.md 等规则文件）
[8] 日期/时间 + 平台信息
[9] 对话历史
[10] 当前用户消息
--------

这非常关键，因为 Hermes 正在针对大模型供应商的提示词缓存（Prompt Caching）机制进行优化。代码显示，提示词构建器的目标非常明确：��稳定的前缀部分尽可能长时间地保持不变。

这一个决定就解释了 Hermes 大部分的记忆架构。

如果某条信息每一轮对话都要用到，Hermes 会尽量把它缩得很小并注入进去；如果信息量很大、属于历史旧账或者偶尔才有用，Hermes 就会把它踢出提示词，改用“按需检索”的方式。

第一层：固化的提示词记忆

其内置的记忆系统小得令人惊讶。

Hermes 将持久记忆存储在 ~/.hermes/memories/ 下的两个文件中：

1). MEMORY.md
智能体笔记：环境、规范、工具怪癖、教训
限制：2,200 字符

2). USER.md
用户画像：偏好、沟通风格、身份信息
限制：1,375 字符

这容量真不大。加起来大约只有 1,300 个 Token（模型理解文本的最小单位）。

而这正是刻意为之。

在会话开始时，Hermes 加载这两个文件，把它们渲染进提示词区块，然后在整个会话期间固化这个快照。会话中途写入的记忆会立即存入硬盘，但不会改变已经生成的系统提示词。这些改动只有在开启新会话，或者触发了“压缩（Compression）”导致的提示词重建时才会生效。

渲染后的格式如下：

------

═══════════
MEMORY (你的个人笔记) [67% — 1,474/2,200 字符]
═══════════
用户的项目是一个位于 ~/code/myapi 的 Rust Web 服务，使用 Axum + SQLx
§
这台机器运行 Ubuntu 22.04，安装了 Docker 和 Podman
§
用户喜欢简洁的回复，讨厌冗长的解释

------

这里有几个我非常欣赏的细节设计：

1. 使用字符限制而非 Token 限制：这让记忆逻辑与模型无关。Hermes 不需要调用特定模型的计算工具就能判断记忆是否存满。

2. 简单的分隔符文件格式：条目之间用 § 分隔。没有复杂的向量数据库（Vector DB），没有自定义二进制存储，就是纯文本。

3. 刻意保持极小的系统提示词空间：这是整个设计的重中之重。Hermes 不想把所有历史都塞进提示词，它只想要最有价值的事实。

4. 记忆是“精选状态”，而不是“日记”：这是 Hermes 与 OpenClaw 最大的区别。

OpenClaw 的日志更像是“流水账”。而 Hermes 则反其道而行。它的工具架构和测试逻辑强调：
• 保存用户偏好。
• 保存环境事实。
• 保存反复出现的错误修正。
• 保存稳定的规范。
• 不保存任务进度。
• 不保存会话结果。
• 不保存临时的待办事项（TODO）。

真相是：Hermes 希望 MEMORY.md 和 USER.md 保持精简、高频且对缓存友好。

memory 工具

Hermes 通过一个拥有三种操作的 memory 工具来管理这些文件：add（添加）、replace（替换）、remove（移除）。

一个好用的细节是：replace 和 remove 使用子字符串匹配。你不需要记住条目的内部 ID，只需要传入现有条目中一段唯一的文字即可。

此外，系统会拒绝完全重复的内容，并拦截危险信息。源代码会扫描记忆条目，防止提示词注入（Prompt Injection，即通过输入恶意指令误导 AI）、凭证泄露或隐藏的 Unicode 字符。

第二层：用于情景回溯的 session_search

如果说 MEMORY.md 是 Hermes 的“短期热记忆”，那么 session_search 就是它的“长尾回溯系统”。

所有过去的会话都存储在 SQLite 数据库中，拥有完整的索引和搜索功能。当模型需要想起以前聊过的内容时，它不去翻 MEMORY.md，而是搜索这个会话数据库。

其工作流程是：
1. 在过去的消息中进行全文搜索。
2. 按会话分组结果。
3. 加载匹配度最高的会话。
4. 使用一个便宜的辅助模型对这些会话进行摘要总结。
5. 将精炼后的回顾内容返回给主模型。

这是一种非常务实的设计。它比盲目地把长篇累牍的历史塞进每一个提示词要便宜且高效得多。

第三层：压缩与记忆冲刷（Memory Flush）

Hermes 另一个聪明之处在于它处理长对话“压缩”的方式。

当会话变得太长，Hermes 会压缩对话中间的部分以节省空间。但摘要是有损的，重要事实可能会丢失。

于是，Hermes 会先进行一次“记忆冲刷”。

在压缩之前，它会发送一条指令告诉模型：

> “会话即将压缩，请保存任何值得记住的东西。优先保存用户偏好、修正建议和重复模式，而非具体的任务细节。”

然后它运行一次额外的模型调用，只开启 memory 工具。如果模型觉得有什么东西该留下来，就��在对话��“洗掉”之前把它写入 MEMORY.md。

第四层：作为程序记忆的技能（Skills）

Hermes 不仅能记住事实，还能记住技能。

技能（Skills）存储在 ~/.hermes/skills/ 下。当 Hermes 发现了一个复杂的流程、修复了一个棘手的问题或学会了更好的方法时，它可以将其保存为“技能”。

大多数记忆系统只关注“语义回溯”（名字、偏好、事实），但智能体还需要记住如何做事。

为了效率，Hermes 不会把所有技能都塞进提示词，而是只放一个技能索引，只有在需要时才加载具体的技能内容。

第五层：用于深层建模的 Honcho

最后是可选的 Honcho 层。如果说本地记忆是 Hermes 的笔记本，Honcho 就是它尝试构建的复杂用户模型。它能实现跨设备、跨平台的记忆连续性。

最精妙的是它如何在不破坏提示词缓存的前提下实现集成：
• 在会话的第一轮，Honcho 的上下文会被织入系统提示词。
• 在之后的对话中，为了保持提示词稳定，Honcho 的回溯内容会附加在当前用户的提问后面，而不是修改系统提示词。

这确保了缓存依然有效，同时 AI 依然能读到最新的背景信息。

Hermes 与 OpenClaw 的区别

• OpenClaw：记忆更接近“以 Markdown 为中心的存储”，日志和长效文件是主要事实来源。
• Hermes：提示词记忆被严格限制，历史记录存在 SQLite 里，只有需要时才搜索。

Hermes 更加关注缓存效率。它认为：不是所有东西都配住在“系统提示词”这个黄金地段。

总结：Hermes 做对了什么？

1. 冷热分离：小规模提示词记忆负责常驻信息，搜索负责偶尔用到的信息。

2. 缓存优先：它意识到频繁改动提示词会导致延迟增加和成本上升。

3. 记忆的多样性：它承认记忆是分层的——包括个人画像、情景回溯、操作技能和深层建模。

Hermes 的核心设计原则最令我折服：记忆应该让智能体变得更好用，而不是通过摧毁提示词的稳定性来换取博闻强识。

真正的诀窍不是记住更多，而是在正确的层级、以正确的成本，记住正确的事情。

271

157K

ipandao retweeted

Jason Zhu

@GoSailGlobal

2 months ago

DeepSeek 像一把抵在硅谷模型公司背后的枪 🔫 硅谷101 今天上线了一期炸裂对谈：OpenAI 前研究员 Jenny Xiao × 芯片架构师肖志斌，两个硅谷内部人聊 DeepSeek v4 带来的生存危机刚好也看到国内比较喜欢的AI博主大聪明“赛博禅心”，在解读这个视频，直播��的两个嘉宾很有料： - 肖志斌：ZFLOW AI 创始人兼 CEO，前华美半导体协会主席，资深芯片架构师 - Jenny Xiao：前 OpenAI 研究员，Leonis Capital 合伙人，专注 AI 投资 I've heard a similar point on an A16z podcast before, and it seems like reality has proven it right again. @pmarca @venturetwins @omooretweets 最狠的三句话： 1️⃣ "If you're a foundation model company and you get surpassed by open source, the value of your business is essentially zero." 这不是技术竞争，这是生死线（kill line） 2️⃣ "硅谷公司钱太多，反而没动力优化效率。中国模型厂商被资源倒逼，更早进入 token efficiency 创新"，资源约束 = 创新加速器 3️⃣ "没有效率，AGI 就只能是个 demo。有了效率，AGI 才能成为真正的产品" ，DeepSeek v4：计算成本 1/3，内存占用 1/10 核心观点 - Anthropic 估值超过 OpenAI 的真相：专注 > 什么都做 - GPT-5.5 比 GPT-5 贵 2 倍，DeepSeek v4 便宜 10 倍，谁在裸泳？ - 英伟达短期安全，长期推理市场会被 TPU / 升腾 / 寒武纪瓜分 - Claude Code 为什么是 Anthropic 的定义时刻完整对谈👇

317

352

225K

ipandao retweeted

Jason Zhu

@GoSailGlobal

2 months ago

附上B站视频链接（画质会清晰些）： https://t.co/AvJO07zgLY 三个 kill line： 1️⃣ 价格 kill line：企业客户只关心"每个任务多少钱"，不关心参数量 2️⃣ 能力 kill line：如果 DeepSeek v6 达到 o4 水平，贵 10 倍的模型还有人用吗？ 3️⃣ 生态 kill line：华为升腾 + DeepSeek 适配 = 非英伟达推理生态成型

GoSailGlobal's tweet photo. 附上B站视频链接（画质会清晰些）：

https://t.co/AvJO07zgLY

三个 kill line：
1️⃣ 价格 kill line：企业客户只关心"每个任务多少钱"，不关心参数量

2️⃣ 能力 kill line：如果 DeepSeek v6 达到 o4 水平，贵 10 倍的模型还有人用吗？

3️⃣ 生态 kill line：华为升腾 + DeepSeek 适配 = 非英伟达推理生态成型

ipandao retweeted

Unsloth AI

@UnslothAI

2 months ago

Qwen3.6-27B can now run locally! 💜 Run on 18GB RAM via Unsloth Dynamic GGUFs. Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks. GGUFs: https://t.co/ykKgwh2zI9 Guide: https://t.co/ITLNq20WJp

UnslothAI's tweet photo. Qwen3.6-27B can now run locally! 💜

Run on 18GB RAM via Unsloth Dynamic GGUFs.

Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks.

GGUFs: https://t.co/ykKgwh2zI9
Guide: https://t.co/ITLNq20WJp https://t.co/8ADXPDAyAk

408

577K

ipandao retweeted

DeepSeek

@deepseek_ai

2 months ago

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: https://t.co/drlDrxkYtp 🤗 Open Weights: https://t.co/T13Y8i7SDM 1/n

deepseek_ai's tweet photo. 🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.

Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today!

📄 Tech Report: https://t.co/drlDrxkYtp
🤗 Open Weights: https://t.co/T13Y8i7SDM

1/n

46K

10K

10M

ipandao retweeted

Gorden Sun

@Gorden_Sun

4 months ago

gstack：YC CEO开源的工具集可以把Claude Code变成有角色分工、有流程管控的开发团队，本质是15个“/命令”，把开发流程拆成了7个阶段： Think → Plan → Build → Review → Test → Ship → Reflect，可以从产品需求、UI设计、测试等多个方面提升软件开发的质量。 Github：https://t.co/S4UfRfklog

Gorden_Sun's tweet photo. gstack：YC CEO开源的工具集
可以把Claude Code变成有角色分工、有流程管控的开发团队，本质是15个“/命令”，把开发流程拆成了7个阶段：
Think → Plan → Build → Review → Test → Ship → Reflect，可以从产品需求、UI设计、测试等多个方面提升软件开发的质量。

Github：https://t.co/S4UfRfklog https://t.co/7N0Ts8ivV0

480

108

585

41K

ipandao retweeted

SpaceX

@SpaceX

over 2 years ago

Starship re-entering Earth's atmosphere. Views through the plasma

80K

16K

14M

ipandao retweeted

Bytebytego

@bytebytego

over 2 years ago

Top 12 Tips for API Security - Use HTTPS - Use OAuth2 - Use WebAuthn - Use Leveled API Keys - Authorization - Rate Limiting - API Versioning - Whitelisting - Check OWASP API Security Risks - Use API Gateway - Error Handling - Input Validation – Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): https://t.co/FIzCeaWsZV

236

870

64K

ipandao retweeted

Bytebytego

@bytebytego

over 2 years ago

813

225

499

37K

ipandao retweeted

Alex Xu

@alexxubyte

over 2 years ago

Top Architectural Styles. The method to download the high-resolution image is available at the end. In software development, architecture plays a crucial role in shaping the structure and behavior of software systems. It provides a blueprint for system design, detailing how components interact with each other to deliver specific functionality. They also offer solutions to common problems, saving time and effort and leading to more robust and maintainable systems. However, with the vast array of architectural styles and patterns available, it can take time to discern which approach best suits a particular project or system. Aims to shed light on these concepts, helping you make informed decisions in your architectural endeavors. To help you navigate the vast landscape of architectural styles and patterns, there is a cheat sheet that encapsulates all. This cheat sheet is a handy reference guide that you can use to quickly recall the main characteristics of each architectural style and pattern. – Subscribe to our newsletter to download the 𝐡𝐢𝐠𝐡 𝐫𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧 𝐢𝐦𝐚𝐠𝐞. After signing up, find the download link on the success page: https://t.co/ito2aWqd62

455

101K

ipandao retweeted

Jiayuan (JY) Zhang

@jiayuan_jy

over 2 years ago

过去一个月，收到了非常多的用户反馈，表示已经把 https://t.co/AgfDhVE33c 作为了默认的搜索引擎。 https://t.co/AgfDhVE33c 是专门面向开发者的 AI 搜索引擎，目标是替代开发者日常使用 Google / StackOverflow / ��档查询的场景。免费、快速、准确。

jiayuan_jy's tweet photo. 过去一个月，收到了非常多的用户反馈，表示已经把 https://t.co/AgfDhVE33c 作为了默认的搜索引擎。

https://t.co/AgfDhVE33c 是专门面向开发者的 AI 搜索引擎，目标是替代开发者日常使用 Google / StackOverflow / ��档查询的场景。

免费、快速、准确。 https://t.co/iQIWS6SFU1

367

155

61K

pandao

@ipandao

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users