absorbguo

@absorbguo

Be happy

Joined October 2012

200 Following

6 Followers

240 Posts

absorbguo retweeted

Xudong Han

@Xudong07452910

12 days ago

AutoResearch AI 这论文挺值得看的。它讲的不是“AI 帮你总结论文”这种单点能力，而是一个更��的趋势：科研正在从 task-level AI，走向 workflow-level AI。也就是说，AI 以后不只是帮你查文献、写代码、润色论文，而是可能参与完整科研流程：读文献、找问题、提假设、设计实验、调用工具跑实验、验证结果、写报��、再根据反馈修改。论文里有个概念叫 Vibe Research，我觉得很形象：现在很多科研人其实已经在做了。人类给方向，AI 帮忙查、写、跑、改，最后人类负责判断和验证。但作者也很清醒：真正的 AI 科学家还没到来。当前系统最大的问题不是会不会生成想法，而是证据能不能保存、实验能不能复现、弱方向能不能被及时拒绝、结论能不能追溯来源。我觉得这篇文章最大的启发是：未来科研能力的竞争，可能不只是“谁会用 AI 写论文”，而是谁能搭出一套可靠的 AI research workflow。 AI for Science 的下一步，不是更会聊天的科研助手，而是更可验证、更可复现、更能闭环的科研工作流。 https://t.co/prnPUiBckS #AIforScience #AutoResearch #Codex #claudecode

Xudong07452910's tweet photo. AutoResearch AI 这论文挺值得看的。

它讲的不是“AI 帮你总结论文”这种单点能力，而是一个更��的趋势：科研正在从 task-level AI，走向 workflow-level AI。

也就是说，AI 以后不只是帮你查文献、写代码、润色论文，而是可能参与完整科研流程：读文献、找问题、提假设、设计实验、调用工具跑实验、验证结果、写报��、再根据反馈修改。

论文里有个概念叫 Vibe Research，我觉得很形象：现在很多科研人其实已经在做了。人类给方向，AI 帮忙查、写、跑、改，最后人类负责判断和验证。

但作者也很清醒：真正的 AI 科学家还没到来。当前系统最大的问题不是会不会生成想法，而是证据能不能保存、实验能不能复现、弱方向能不能被及时拒绝、结论能不能追溯来源。

我觉得这篇文章最大的启发是：未来科研能力的竞争，可能不只是“谁会用 AI 写论文”，而是谁能搭出一套可靠的 AI research workflow。

AI for Science 的下一步，不是更会聊天的科研助手，而是更可验证、更可复现、更能闭环的科研工作流。

https://t.co/prnPUiBckS
#AIforScience #AutoResearch #Codex #claudecode

527

117

457

30K

absorbguo retweeted

Ziwei Liu

@liuziwei7

12 days ago

🔥LLaVA-OneVision-2.0 Open Sourced🔥 LLaVA-OneVision series @lmmslab now upgrades to 2.0 with its key advance on *codec-stream tokenization*, which treats highly dynamic video as a continuous bit-cost stream - Tech Report: https://t.co/pFo2fGYj2M - Code: https://t.co/JvRzu96rJ1

liuziwei7's tweet photo. 🔥LLaVA-OneVision-2.0 Open Sourced🔥

LLaVA-OneVision series @lmmslab now upgrades to 2.0 with its key advance on *codec-stream tokenization*, which treats highly dynamic video as a continuous bit-cost stream

- Tech Report: https://t.co/pFo2fGYj2M
- Code: https://t.co/JvRzu96rJ1 https://t.co/d1BgKzQo8I

236

101

19K

absorbguo retweeted

Xudong Han

@Xudong07452910

12 days ago

📘 开源免费高质量教程推荐：《从零开始构建Agent》这是一本系统性的 Agent 原理与实践教程，从零基础到进阶实战，内容覆盖： 1. Agent基础概念与主流范式（ReAct、Plan-and-Solve、Reflection等） 2. 主流框架深度解析（AutoGen、LangGraph、AgentScope） 3. 进阶技术（记忆与RAG、上下文工程、多Agent通信协议、Agentic-RL、评估体系） 4. 完整案例实战（智能旅行助手、自动化深度研究Agent、多Agent模拟等） 5. 从零手把手搭建自己的Agent框架（HelloAgents）特别适合有一定Python基础和LLM使用经验的开发者、学生以及对AI Agent感兴趣的从业者，是一套理论扎实、代码可落地的学习资料。我当初就是把这个开源项目全部学了一遍入门Agent的，并且把所有例子都跑了一遍加深理解。目前已获得 5.3万+ stars，是中文圈质量较高的开源Agent教程之一。 https://t.co/EeL8dspAss #AIAgent #ClaudeCode #LangGraph #AutoGen #AI教程 #Codex

Xudong07452910's tweet photo. 📘 开源免费高质量教程推荐：《从零开始构建Agent》

这是一本系统性的 Agent 原理与实践教程，从零基础到进阶实战，内容覆盖：

1. Agent基础概念与主流范式（ReAct、Plan-and-Solve、Reflection等）
2. 主流框架深度解析（AutoGen、LangGraph、AgentScope）
3. 进阶技术（记忆与RAG、上下文工程、多Agent通信协议、Agentic-RL、评估体系）
4. 完整案例实战（智能旅行助手、自动化深度研究Agent、多Agent模拟等）
5. 从零手把手搭建自己的Agent框架（HelloAgents）

特别适合有一定Python基础和LLM使用经验的开发者、学生以及对AI Agent感兴趣的从业者，是一套理论扎实、代码可落地的学习资料。

我当初就是把这个开源项目全部学了一遍入门Agent的，并且把所有例子都跑了一遍加深理解。

目前已获得 5.3万+ stars，是中文圈质量较高的开源Agent教程之一。

https://t.co/EeL8dspAss
#AIAgent #ClaudeCode #LangGraph #AutoGen #AI教程 #Codex

320

321

18K

absorbguo retweeted

Shuo Yang

@Andy_ShuoYang

12 days ago

Flash-KMeans was only the beginning. Today, from the Flash-KMeans team, we are releasing FlashLib — a GPU library for fast, predictable, agent-ready classical ML operators. Up to 26× on KMeans, 19× on KNN, 40× on HDBSCAN, 208× on TruncatedSVD, 47× on PCA, 147× on exact t-SNE, and 49× on MultinomialNB over state-of-the-art (cuML). Blog: https://t.co/P31SGl0cyT Code: https://t.co/9nkO2hmeOl

237

864K

absorbguo retweeted

Victor M

@victormustar

18 days ago

Extremely fast on HuggingChat (served by Cohere) https://t.co/vq5wv4MJec

18K

absorbguo retweeted

meng shao

@shao__meng

about 1 month ago

Cursor 团队这篇「持续改进我们的 Agent Harness」，写的真不错，很实战： · 如何衡量 harness 的好坏？ · 如何为不同模型定制 harness？ · 中途换模型到底会有什么问题？ · 对未来的判断：Multi-Agent 是 harness 问题 https://t.co/Yz6YuY4eZ3 Cursor 团队对模型和 harness 的判断：模型的上限决定天花板，但 harness 决定模型实际能跑多远。 # 方法论：愿景驱动 + 实验闭环 · 先有一个"理想 agent 体验"的主观判断，再分解为可验证的假设。 · 通过线上 A/B 与离线 eval 双轨验证，��仪表化判断每次改动是否真的更好。 · 大改动罕见，常态是"强迫症式地堆叠小优化"。 · 每当拿到新模型早期访问，会花数周专门为该模型重塑 harness，使同一模型在 Cursor 里更快、更聪明、更省 token。 # 上下文窗口的演进：harness 的核心战场 2024 年末的旧范式：守卫式 · 模型自己挑上下文能力差，所以 Cursor 加了大量护栏：每次编辑后回灌 lint/类型错误、读文件行数太少时自动改写、限制单轮工具调用次数。 · 静态注入大量上下文：目录结构、语义匹配的代码片段、被压缩过的用户附件文件。 2026 年的新范式：动态获取式 · 静态上下文大幅瘦身，只保留确实有用的（OS、git 状态、当前/最近查看的文件）。 · 拆掉护栏，把"取什么上下文"的权力交还模型，由它在工作中动态拉取。 · 现在的工作重心是给 agent 提供更多与世界交互的方式，而不是替它准备好一切。关键启示：随着模型能力提升，harness 设计的趋势是 "减少喂养，增加感官"。 # 如何衡量 harness 的好坏 Cursor 用三层叠加的衡量体系： 1. 离线基准：公开 benchmark + 自研 CursorBench。快、可对比，但只是真实使用的近似。 2. 在线 A/B：把多个 harness 变体并行投放给真实用户。 3. 质量指标——重点在两个"模糊但更重要"的指标： · 留存率：agent 写的代码在固定时间窗后还有多少留在用户代码库里。被改动越多，说明初版质量越差。 · LLM 判读用户回应：用模型读用户的回复来判定满意度。"用户开始下一个功能" = 成功；"用户贴了个 stack trace" = 失败。案例：他们曾试过用更贵的模型做上下文摘要，A/B 显示质量提升微乎其微，于是放弃。 # 把 harness 当生产软件来运维：错误分类与告警随着模型与能力变多，harness 的状态空间膨胀，bug 面变大。工具调用是最大的 bug 表面，且工具错误会污染上下文，让后续决策一起劣化。错误被分类管理： · InvalidArguments / UnexpectedEnvironment：模型自身错误或上下文矛盾 · ProviderError：第三方工具（如 GenerateImage、WebSearch）故障 · UserAborted / Timeout 等告警策略： · 未知错误 = bug，超阈值即报警。 · 预期错误用按工具、按模型分别建立基线的异常检测，避免��代码库体量等因素误导。 · 每周跑一个 Cloud Agent Automation：让 agent 自己翻日志，发现新问题或激增问题，在 backlog 自动建/更新 ticket，再调度其他 Cloud Agents 去修。 · 一次专项 sprint 把"未知工具错误率"压低了一个数量级。这就是他们说的 "agent harness 的自动化软件工厂"——用 agent 维护 agent。 # 为不同模型定制 harness Harness 的所有抽象都是模型无关的，但实际为每个模型重度定制： · 工具格式贴合训练分布：OpenAI 训练时用 patch 格式编辑文件，Anthropic 用字符串替换。给错工具会让模型多花推理 token、多犯错。 · Prompt 风格分化：OpenAI 模型偏字面、精确；Claude 更直觉化、容忍模糊指令。 · 新模型上手流程：从最接近的现有模型 harness 复制起步 → 离线 eval 找混乱点 → 团队真人试用 → 反复调。 · 真实模型怪癖案例：某模型出现 "context anxiety"（上下文焦虑）——窗口快满时拒绝继续、说"任务太大"。通过 prompt 微调缓解。中途换模型（mid-chat switching）的难题 · 切模型 → 自动切到该模型对应的 harness（prompts + 工具集）。 · 但对话历史是别的模型生成的，对新模型而言是 OOD 输入。 · 解法：注入 "你正在中途接手另一个模型对话" 的指令；劝阻它去调用历史里出现但当前不属于自己的工具。 · 缓存难题：cache 是按 provider + model 的，切换 = cache miss，第一轮变慢变贵。试过切换时做对话摘要降本，但深度任务里摘要会丢细节。 · 官方建议：除非有理由，否则一段对话用一个模型到底。 · 替代方案：用 subagent 起一个全新上下文的子任务，可以指定模型。 # 对未来的判断：Multi-Agent 是 harness 问题 Cursor 认为 AI 编程的未来是�� agent 协作：规划一个、快速编辑一个、调试一个，各司其职。让这套体系真正跑通的关键，不是某个更强的单一 agent，而是 harness——它要决定： · 派哪个 agent 接手 · 如何按目标 agent 的强项重新组织任务描述 · 如何把多 agent 的产出缝合为连贯工作流结论："harness 工程过去重要，未来只会更关键。"

shao__meng's tweet photo. Cursor 团队这篇「持续改进我们的 Agent Harness」，写的真不错，很实战：
· 如何衡量 harness 的好坏？
· 如何为不同模型定制 harness？
· 中途换模型到底会有什么问题？
· 对未来的判断：Multi-Agent 是 harness 问题
https://t.co/Yz6YuY4eZ3

Cursor 团队对模型和 harness 的判断：模型的上限决定天花板，但 harness 决定模型实际能跑多远。

# 方法论：愿景驱动 + 实验闭环

· 先有一个"理想 agent 体验"的主观判断，再分解为可验证的假设。
· 通过线上 A/B 与离线 eval 双轨验证，��仪表化判断每次改动是否真的更好。
· 大改动罕见，常态是"强迫症式地堆叠小优化"。
· 每当拿到新模型早期访问，会花数周专门为该模型重塑 harness，使同一模型在 Cursor 里更快、更聪明、更省 token。

# 上下文窗口的演进：harness 的核心战场

2024 年末的旧范式：守卫式
· 模型自己挑上下文能力差，所以 Cursor 加了大量护栏：每次编辑后回灌 lint/类型错误、读文件行数太少时自动改写、限制单轮工具调用次数。
· 静态注入大量上下文：目录结构、语义匹配的代码片段、被压缩过的用户附件文件。

2026 年的新范式：动态获取式
· 静态上下文大幅瘦身，只保留确实有用的（OS、git 状态、当前/最近查看的文件）。
· 拆掉护栏，把"取什么上下文"的权力交还模型，由它在工作中动态拉取。
· 现在的工作重心是给 agent 提供更多与世界交互的方式，而不是替它准备好一切。

关键启示：随着模型能力提升，harness 设计的趋势是 "减少喂养，增加感官"。

# 如何衡量 harness 的好坏

Cursor 用三层叠加的衡量体系：
1. 离线基准：公开 benchmark + 自研 CursorBench。快、可对比，但只是真实使用的近似。
2. 在线 A/B：把多个 harness 变体并行投放给真实用户。
3. 质量指标——重点在两个"模糊但更重要"的指标：
· 留存率：agent 写的代码在固定时间窗后还有多少留在用户代码库里。被改动越多，说明初版质量越差。
· LLM 判读用户回应：用模型读用户的回复来判定满意度。"用户开始下一个功能" = 成功；"用户贴了个 stack trace" = 失败。

案例：他们曾试过用更贵的模型做上下文摘要，A/B 显示质量提升微乎其微，于是放弃。

# 把 harness 当生产软件来运维：错误分类与告警

随着模型与能力变多，harness 的状态空间膨胀，bug 面变大。工具调用是最大的 bug 表面，且工具错误会污染上下文，让后续决策一起劣化。

错误被分类管理：
· InvalidArguments / UnexpectedEnvironment：模型自身错误或上下文矛盾
· ProviderError：第三方工具（如 GenerateImage、WebSearch）故障
· UserAborted / Timeout 等

告警策略：
· 未知错误 = bug，超阈值即报警。
· 预期错误用按工具、按模型分别建立基线的异常检测，避免��代码库体量等因素误导。
· 每周跑一个 Cloud Agent Automation：让 agent 自己翻日志，发现新问题或激增问题，在 backlog 自动建/更新 ticket，再调度其他 Cloud Agents 去修。
· 一次专项 sprint 把"未知工具错误率"压低了一个数量级。

这就是他们说的 "agent harness 的自动化软件工厂"——用 agent 维护 agent。

# 为不同模型定制 harness

Harness 的所有抽象都是模型无关的，但实际为每个模型重度定制：
· 工具格式贴合训练分布：OpenAI 训练时用 patch 格式编辑文件，Anthropic 用字符串替换。给错工具会让模型多花推理 token、多犯错。
· Prompt 风格分化：OpenAI 模型偏字面、精确；Claude 更直觉化、容忍模糊指令。
· 新模型上手流程：从最接近的现有模型 harness 复制起步 → 离线 eval 找混乱点 → 团队真人试用 → 反复调。
· 真实模型怪癖案例：某模型出现 "context anxiety"（上下文焦虑）——窗口快满时拒绝继续、说"任务太大"。通过 prompt 微调缓解。

中途换模型（mid-chat switching）的难题
· 切模型 → 自动切到该模型对应的 harness（prompts + 工具集）。
· 但对话历史是别的模型生成的，对新模型而言是 OOD 输入。
· 解法：注入 "你正在中途接手另一个模型对话" 的指令；劝阻它去调用历史里出现但当前不属于自己的工具。
· 缓存难题：cache 是按 provider + model 的，切换 = cache miss，第一轮变慢变贵。试过切换时做对话摘要降本，但深度任务里摘要会丢细节。
· 官方建议：除非有理由，否则一段对话用一个模型到底。
· 替代方案：用 subagent 起一个全新上下文的子任务，可以指定模型。

# 对未来的判断：Multi-Agent 是 harness 问题

Cursor 认为 AI 编程的未来是�� agent 协作：规划一个、快速编辑一个、调试一个，各司其职。
让这套体系真正跑通的关键，不是某个更强的单一 agent，而是 harness——它要决定：
· 派哪个 agent 接手
· 如何按目标 agent 的强项重新组织任务描述
· 如何把多 agent 的产出缝合为连贯工作流

结论："harness 工程过去重要，未来只会更关键。"

357

392

23K

absorbguo retweeted

机器之心 JIQIZHIXIN

@jiqizhixin

about 1 month ago

What if LLMs could reason smarter, not just longer? Researchers from Huawei Taylor Lab, Peking University, and Shanghai University of Finance and Economics introduce SHAPE. The method rewards actual progress in reasoning — not verbosity — by using a two-level system: a stage-aware advantage at the segment level for efficient breakthroughs, and entropy-driven redistribution at the token level for sharper execution. Result: 3% higher accuracy on math reasoning while using 30% fewer tokens across multiple base models and benchmarks. SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning Paper: https://t.co/Rsur6rgbWn Our report: https://t.co/g9Eaw7BebN 📬 #PapersAccepted by Jiqizhixin

jiqizhixin's tweet photo. What if LLMs could reason smarter, not just longer?

Researchers from Huawei Taylor Lab, Peking University, and Shanghai University of Finance and Economics introduce SHAPE.

The method rewards actual progress in reasoning — not verbosity — by using a two-level system: a stage-aware advantage at the segment level for efficient breakthroughs, and entropy-driven redistribution at the token level for sharper execution.

Result: 3% higher accuracy on math reasoning while using 30% fewer tokens across multiple base models and benchmarks.

SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

Paper: https://t.co/Rsur6rgbWn

Our report: https://t.co/g9Eaw7BebN

📬 #PapersAccepted by Jiqizhixin

absorbguo retweeted

鸟哥 | 蓝鸟会🕊️

@NFTCPS

about 1 month ago

用苹果电脑跑本地大模型的人注意了，有个东西你们可能还不知道 Rapid-MLX，一个专门为 Apple Silicon 打造的本地 LLM 推理服务，核心就一句话——比 Ollama 快，而且快不少。具体快多少？官方数据是 2-4 倍。背后用的是苹果自家的 MLX 框架，不是什么民间魔改，是真正吃透了 M 系列芯片架构的方案。快在哪几个点： 1️⃣ KV 缓存裁剪加上 DeltaNet 状态快照，多轮对话的首 token 延迟压到了 0.08 秒左右，你懂这意味着什么，对话几乎感觉不到等待 2️⃣ 工具调用做了 17 种解析器，Qwen、DeepSeek、Gemma、GLM 这些主流模型直接自动识别格式，量化把输出搞坏了也能自动修回来，这个细节做得很扎实 3️⃣ OpenAI 兼容 API，Cursor、Claude Code、Aider、LangChain 统统能直接接，基本上你现在用什么工具链，切过来不用改代码还有一些额外的东西：推理链分离、云端路由、视觉和音频多模态支持、V 缓存压缩，功能密度挺高的。说白了就是，你有一台 M 系列 Mac，想在本地跑模型又嫌 Ollama 慢，那 Rapid-MLX 现在是最值得试的选项之一。 🔗 https://t.co/rYrGxCrFSY

NFTCPS's tweet photo. 用苹果电脑跑本地大模型的人注意了，有个东西你们可能还不知道

Rapid-MLX，一个专门为 Apple Silicon 打造的本地 LLM 推理服务，核心就一句话——比 Ollama 快，而且快不少。

具体快多少？官方数据是 2-4 倍。背后用的是苹果自家的 MLX 框架，不是什么民间魔改，是真正吃透了 M 系列芯片架构的方案。

快在哪几个点：

1️⃣ KV 缓存裁剪加上 DeltaNet 状态快照，多轮对话的首 token 延迟压到了 0.08 秒左右，你懂这意味着什么，对话几乎感觉不到等待

2️⃣ 工具调用做了 17 种解析器，Qwen、DeepSeek、Gemma、GLM 这些主流模型直接自动识别格式，量化把输出搞坏了也能自动修回来，这个细节做得很扎实

3️⃣ OpenAI 兼容 API，Cursor、Claude Code、Aider、LangChain 统统能直接接，基本上你现在用什么工具链，切过来不用改代码

还有一些额外的东西：推理链分离、云端路由、视觉和音频多模态支持、V 缓存压缩，功能密度挺高的。

说白了就是，你有一台 M 系列 Mac，想在本地跑模型又嫌 Ollama 慢，那 Rapid-MLX 现在是最值得试的选项之一。

🔗 https://t.co/rYrGxCrFSY

905

165

91K

absorbguo retweeted

Akshay 🚀

@akshay_pachaar

about 2 months ago

PyTorch Autograd vs. Unsloth Triton Kernels. The core engineering behind UnslothAI has always been impressive! Instead of relying on PyTorch's default autograd for backpropagation, Unsloth built their own backprop kernels from scratch in OpenAI's Triton language (a Python-based language for writing GPU kernels without needing to write raw CUDA C++). One of the reasons to do this is that the default autograd runs each operation as a separate GPU call, and each call reads and writes data back to global memory before the next one can start. Across dozens of transformer layers, this back-and-forth becomes the real bottleneck. These hand-written kernels fuse operations like QKV projections and rotary position embeddings into single GPU calls, and recompute activations on the fly instead of storing them in memory. This allows Unsloth to deliver >2x faster training with 70% less VRAM without any accuracy loss. The loss curves match standard training runs down to the third decimal because the math is exact, not an approximation. All of these kernel optimizations were already available through Unsloth's Python library. But now Unsloth Studio puts a no-code web UI on top of that same engine, and there's a lot of solid engineering packed into this. > The inference engine has a sandboxed code execution layer where models can run Python and bash, compute results, and verify their answers before responding. This means the model can actually execute and validate code instead of just predicting what the output should look like. The tool calling implementation also has a self-healing mechanism. Failed calls get auto-corrected and retried, which is a practical pattern for agentic workflows. > Unsloth's Python library already had GRPO support (the RL technique behind DeepSeek-R1), and Studio now makes this accessible through the UI. PPO requires running a separate critic model alongside the policy model during training, and that critic is typically as large as the model being trained, effectively doubling the VRAM requirement. GRPO eliminates the critic model entirely by generating multiple completions per prompt and computing advantages from the relative quality within that group. This cuts VRAM by 40-60% compared to PPO. Combined with Unsloth's Triton kernels and QLoRA, training a reasoning model on an RTX 4090 or even a 3090 becomes realistic on hardware that most of us actually have. > In most fine-tuning workflows that I have run, the training step is actually the easy part. Getting raw data into a properly formatted dataset is where the real time goes. Unsloth Studio includes Data Recipes (built on NVIDIA's DataDesigner) that take raw PDFs/CSVs/DOCX files, and transform them into structured synthetic datasets through a visual node-based workflow, replacing the custom parsing scripts entirely. Once training is done, models can be exported directly to GGUF, safetensors, or other formats with automatic LoRA adapter merging into base weights. The whole system runs 100% offline with no telemetry. $ pip install unsloth $ unsloth studio setup $ unsloth studio It's still in beta, but the engineering underneath is solid. For anyone working with open-source models locally, this is one of the more complete tools available right now.

akshay_pachaar's tweet photo. PyTorch Autograd vs. Unsloth Triton Kernels.

The core engineering behind UnslothAI has always been impressive!

Instead of relying on PyTorch's default autograd for backpropagation, Unsloth built their own backprop kernels from scratch in OpenAI's Triton language (a Python-based language for writing GPU kernels without needing to write raw CUDA C++).

One of the reasons to do this is that the default autograd runs each operation as a separate GPU call, and each call reads and writes data back to global memory before the next one can start.

Across dozens of transformer layers, this back-and-forth becomes the real bottleneck.

These hand-written kernels fuse operations like QKV projections and rotary position embeddings into single GPU calls, and recompute activations on the fly instead of storing them in memory.

This allows Unsloth to deliver >2x faster training with 70% less VRAM without any accuracy loss.

The loss curves match standard training runs down to the third decimal because the math is exact, not an approximation.

All of these kernel optimizations were already available through Unsloth's Python library.

But now Unsloth Studio puts a no-code web UI on top of that same engine, and there's a lot of solid engineering packed into this.

> The inference engine has a sandboxed code execution layer where models can run Python and bash, compute results, and verify their answers before responding.

This means the model can actually execute and validate code instead of just predicting what the output should look like.

The tool calling implementation also has a self-healing mechanism. Failed calls get auto-corrected and retried, which is a practical pattern for agentic workflows.

> Unsloth's Python library already had GRPO support (the RL technique behind DeepSeek-R1), and Studio now makes this accessible through the UI.

PPO requires running a separate critic model alongside the policy model during training, and that critic is typically as large as the model being trained, effectively doubling the VRAM requirement.

GRPO eliminates the critic model entirely by generating multiple completions per prompt and computing advantages from the relative quality within that group.

This cuts VRAM by 40-60% compared to PPO. Combined with Unsloth's Triton kernels and QLoRA, training a reasoning model on an RTX 4090 or even a 3090 becomes realistic on hardware that most of us actually have.

> In most fine-tuning workflows that I have run, the training step is actually the easy part. Getting raw data into a properly formatted dataset is where the real time goes.

Unsloth Studio includes Data Recipes (built on NVIDIA's DataDesigner) that take raw PDFs/CSVs/DOCX files, and transform them into structured synthetic datasets through a visual node-based workflow, replacing the custom parsing scripts entirely.

Once training is done, models can be exported directly to GGUF, safetensors, or other formats with automatic LoRA adapter merging into base weights.

The whole system runs 100% offline with no telemetry.

$ pip install unsloth
$ unsloth studio setup
$ unsloth studio

It's still in beta, but the engineering underneath is solid. For anyone working with open-source models locally, this is one of the more complete tools available right now.

225

152

24K

absorbguo retweeted

Xinyu Zhou

@zxytim

about 2 months ago

My daily drive now

106

absorbguo retweeted

Akshay 🚀

@akshay_pachaar

about 2 months ago

Google DeepMind dropped a paper that should scare every agent builder. It's the first systematic framework for a threat that barely existed two years ago: adversarial content engineered to hijack AI agents browsing the web. They call them AI Agent Traps. The paper maps six distinct attack surfaces. 1) Content Injection Traps (perception) Invisible CSS, hidden HTML, steganographic payloads inside images. The agent parses it, humans never see it. One study showed simple HTML injections hijack web agents in up to 86% of scenarios. 2) Semantic Manipulation Traps (reasoning) No overt commands. Just biased phrasing, framing, and contextual priming that skew the agent's synthesis. LLMs inherit human cognitive biases, and attackers can weaponize every one of them. 3) Cognitive State Traps (memory and learning) Poison the RAG corpus. Corrupt long-term memory. One study achieved over 80% attack success with less than 0.1% poisoned data. 4) Behavioural Control Traps (action) Jailbreaks embedded in external resources. Data exfiltration prompts hidden in emails. Sub-agent spawning that tricks an orchestrator into instantiating attacker-controlled agents inside the trusted control flow. 5) Systemic Traps (multi-agent dynamics) This is where it gets scary. A single fake news headline could trigger a synchronized sell-off. A compositional fragment trap splits a payload across sources, so each fragment looks benign until agents aggregate them. 6) Human-in-the-Loop Traps The agent becomes the vector. The target is you. Invisible prompt injections have already caused summarization tools to faithfully repeat ransomware commands as "fix" instructions. The core insight is uncomfortable. By altering the environment instead of the model, attackers weaponize the agent's own capabilities against it. Training-time defenses cannot solve an inference-time problem. The paper closes by calling for automated red-teaming that can probe these vulnerabilities at scale. That same shift is already happening on the offense side. Strix is an open-source project doing exactly this for web apps. AI agents that act like real hackers, running your code dynamically, finding vulnerabilities, and validating them with actual proof-of-concepts. 24k stars on GitHub. Apache 2.0 licensed. The agents writing your code need to be tested by agents trying to break it. I've shared the link to the paper and Strix GitHub repo in the replies

akshay_pachaar's tweet photo. Google DeepMind dropped a paper that should scare every agent builder.

It's the first systematic framework for a threat that barely existed two years ago: adversarial content engineered to hijack AI agents browsing the web.

They call them AI Agent Traps. The paper maps six distinct attack surfaces.

1) Content Injection Traps (perception)

Invisible CSS, hidden HTML, steganographic payloads inside images. The agent parses it, humans never see it. One study showed simple HTML injections hijack web agents in up to 86% of scenarios.

2) Semantic Manipulation Traps (reasoning)

No overt commands. Just biased phrasing, framing, and contextual priming that skew the agent's synthesis. LLMs inherit human cognitive biases, and attackers can weaponize every one of them.

3) Cognitive State Traps (memory and learning)

Poison the RAG corpus. Corrupt long-term memory. One study achieved over 80% attack success with less than 0.1% poisoned data.

4) Behavioural Control Traps (action)

Jailbreaks embedded in external resources. Data exfiltration prompts hidden in emails. Sub-agent spawning that tricks an orchestrator into instantiating attacker-controlled agents inside the trusted control flow.

5) Systemic Traps (multi-agent dynamics)

This is where it gets scary. A single fake news headline could trigger a synchronized sell-off. A compositional fragment trap splits a payload across sources, so each fragment looks benign until agents aggregate them.

6) Human-in-the-Loop Traps

The agent becomes the vector. The target is you. Invisible prompt injections have already caused summarization tools to faithfully repeat ransomware commands as "fix" instructions.

The core insight is uncomfortable.

By altering the environment instead of the model, attackers weaponize the agent's own capabilities against it. Training-time defenses cannot solve an inference-time problem.

The paper closes by calling for automated red-teaming that can probe these vulnerabilities at scale. That same shift is already happening on the offense side.

Strix is an open-source project doing exactly this for web apps. AI agents that act like real hackers, running your code dynamically, finding vulnerabilities, and validating them with actual proof-of-concepts.

24k stars on GitHub. Apache 2.0 licensed.

The agents writing your code need to be tested by agents trying to break it.

I've shared the link to the paper and Strix GitHub repo in the replies

864

206

91K

absorbguo retweeted

GitHubDaily

@GitHub_Daily

about 2 months ago

GitHub 上一份从入门到进阶的 CUDA 开源教程：LeetCUDA，配合 PyTorch 学习，非常适合初学者。共收录了 200 多个循序渐进的 CUDA 内核实现，涵盖从基础的元素级操作到复杂的 HGEMM 库。 GitHub：https://t.co/JBVoLuDxJx 提供完整的底层代码，还配套整理了 100 多篇高质量的高性能计算技术博客，帮我们打通理论与实践的闭环。如果你想系统学习 CUDA 高性能计算，或者正在准备大模型推理相关的面试，这份资料值得收藏。

GitHub_Daily's tweet photo. GitHub 上一份从入门到进阶的 CUDA 开源教程：LeetCUDA，配合 PyTorch 学习，非常适合初学者。

共收录了 200 多个循序渐进的 CUDA 内核实现，涵盖从基础的元素级操作到复杂的 HGEMM 库。

GitHub：https://t.co/JBVoLuDxJx

提供完整的底层代码，还配套整理了 100 多篇高质量的高性能计算技术博客，帮我们打通理论与实践的闭环。

如果你想系统学习 CUDA 高性能计算，或者正在准备大模型推理相关的面试，这份资料值得收藏。

179

163

13K

absorbguo retweeted

Kimi.ai @Kimi_Moonshot

about 2 months ago

We're open-sourcing FlashKDA — our high-performance CUTLASS-based implementation of Kimi Delta Attention kernels. Achieves 1.72×–2.22× prefill speedup over the flash-linear-attention baseline on H20, and works as a drop-in backend for flash-linear-attention. Explore on github: https://t.co/sf4UohXDWY

183

618

213K

absorbguo retweeted

陈成

@chenchengpro

about 2 months ago

今天读到一篇很锋利的论文，提出了一个概念叫「LLM 谬误」。什么意思呢，你用 AI 写出了一篇漂亮的分析报告，然后潜意识里开始觉得「我确实有这个水平」。这不是幻觉问题（输出对不对），不是自动化偏差（太信 AI），是一种更阴的东西，你因为用了 AI，开始太信自己。论文拆解了四个机制， 1）归因模糊。你丢了一句模糊的提示词进去，AI 吐出来一段结构完整、论证清晰的内容。你改了几个词，又丢回去，它又优化了一版。几轮下来，你已经分不清哪些想法是你的、哪些是它的了。人的大脑有个毛病，倾向��从结果反推作者身份，「这个东西是在我的��话里产出的，所以是我的」。 2）流畅性幻觉。AI 输出天然就语法正确、逻辑通顺、风格统一，看着就像一个资深人士写的。问题是人脑会把「读起来顺畅」自动等价于「写的人很专业」，这是一个认知捷径，你根本不会去审视内容到底是怎么生成的，表面的流畅直接就把你骗过去了。 3）管道不透明。传统工具你好歹能看到中间步骤，Excel 公式、SQL 查询，过程是透明的。但 AI 的检索、模式匹配、综合推理全部藏在黑箱里，你只看到输入和输出两头。中间它到底做了多少活，你完全无从判断，也就没办法准确地分配功劳。 4）认知外包。推理让 AI 推，组织让 AI 组织，措辞让 AI 润色，你自己参与的认知深度越来越浅。反复外包之后，你连评估自己到底懂不懂的能力都退化了。越依赖越不自知，越不自知越高估，正反馈循环。这四个齿轮一咬合，感知能力和实际能力之间就裂开一道缝，而且是系统性的那种。更要命的是往上捅到了制度层面。候选人用 AI 辅助做出高质量 portfolio，面试官只看产出根本判断不了独立能力；学生用 AI 完成作业，成绩不再反映真实理解；资质认证的信号价值被稀释。这篇论文目前还是纯概念性的，没有实验数据。但它给一个东西起了名字，一个几乎每个 AI 重度用户都隐约感觉到、但没人正式说破的东西。说真的，值得反复问自己一个问题，离开 AI，你还��多少？ https://t.co/veBTKMlv8Y

394

325

45K

absorbguo @absorbguo

about 2 months ago

absorbguo retweeted

Avi Chawla

@_avichawla

about 2 months ago

Mixture of Experts (MoEs), explained visually: (learn how they work below)

725

103

489

61K

absorbguo retweeted

Kimi.ai @Kimi_Moonshot

about 2 months ago

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: https://t.co/uvoSJKyGCY - 🔗 API: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/9wWvgIQSS3 🔗 Weights & code: https://t.co/Be0hjs2RTP

Kimi_Moonshot's tweet photo. Meet Kimi K2.6: Advancing Open-Source Coding

🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2)

What's new:
🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization).
🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D.
🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files.
🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops.
🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop.
-
K2.6 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode.
For production-grade coding, pair K2.6 with Kimi Code: https://t.co/uvoSJKyGCY
-
🔗 API: https://t.co/EOZkbOwCN4
🔗 Tech blog: https://t.co/9wWvgIQSS3
🔗 Weights & code: https://t.co/Be0hjs2RTP

941

18K

absorbguo retweeted

karminski-牙医

@karminski3

about 2 months ago

Qwen3.6-35B-A3B 2bit 量化都这么猛吗? Unsloth 团队(当然他们只有哥俩)刚光速放出了量化版本的 Qwen3.6-35B-A3B, 然后他们做这个测试把我惊呆了... 2bit 能完成 30 多次工具调用??? 我是真不信的.. 因为我之前测 Qwen3.5-35B-A3B 8bit (mlx 格式哈) 大概只能 4-5 次工具调用就不行了, 大概只能做做整理邮件这种简单工作, 但凡让它整理完邮件做个统计记录到 Notion / Obsidian 上就炸了. 要知道 unsloth 的 2bit 动态量化这个模型只有12.3GB, 激活只有1G! 32G 的 Mac 可以轻松跑起来了. 我赶紧测一下试试, 稍后给大家带来实测效果. https://t.co/8sj4kkLjWg

karminski3's tweet photo. Qwen3.6-35B-A3B 2bit 量化都这么猛吗?

Unsloth 团队(当然他们只有哥俩)刚光速放出了量化版本的 Qwen3.6-35B-A3B, 然后他们做这个测试把我惊呆了... 2bit 能完成 30 多次工具调用???

我是真不信的.. 因为我之前测 Qwen3.5-35B-A3B 8bit (mlx 格式哈) 大概只能 4-5 次工具调用就不行了, 大概只能做做整理邮件这种简单工作, 但凡让它整理完邮件做个统计记录到 Notion / Obsidian 上就炸了.

要知道 unsloth 的 2bit 动态量化这个模型只有12.3GB, 激活只有1G! 32G 的 Mac 可以轻松跑起来了.

我赶紧测一下试试, 稍后给大家带来实测效果.

https://t.co/8sj4kkLjWg

575

463

71K

absorbguo retweeted

How To AI

@HowToAI_

about 2 months ago

someone open-source a 1.7b parameter model that parses literally anything. text, tables, formulas, images, and pdfs in `100+ languages. 100% open-source.

HowToAI_'s tweet photo. someone open-source a 1.7b parameter model that parses literally anything.

text, tables, formulas, images, and pdfs in `100+ languages.

100% open-source. https://t.co/Fz3ij8DW2n

111

52K

absorbguo retweeted

机器之心 JIQIZHIXIN

@jiqizhixin

about 2 months ago

Why has scaling Diffusion Transformers with Mixture-of-Experts been so tricky for visual data? Researchers from Fudan University, Alibaba Group's Tongyi Lab, Zhejiang University, The University of Hong Kong, and MMLab just cracked the code! They introduce ProMoE, an MoE framework that makes vision experts smarter. It uses a two-step router to first group image parts by their function (e.g., background vs. object) and then refines these assignments based on their semantic content, ensuring each expert focuses on what it does best. This specialized routing boosts performance significantly, outperforming state-of-the-art methods on the demanding ImageNet benchmark for diffusion models. Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance Paper: https://t.co/Pz51lY1Unp Code: https://t.co/J7HMHOA5ZS Our report: https://t.co/IKozelW5xx 📬 #PapersAccepted by Jiqizhixin

jiqizhixin's tweet photo. Why has scaling Diffusion Transformers with Mixture-of-Experts been so tricky for visual data?

Researchers from Fudan University, Alibaba Group's Tongyi Lab, Zhejiang University, The University of Hong Kong, and MMLab just cracked the code!

They introduce ProMoE, an MoE framework that makes vision experts smarter. It uses a two-step router to first group image parts by their function (e.g., background vs. object) and then refines these assignments based on their semantic content, ensuring each expert focuses on what it does best.

This specialized routing boosts performance significantly, outperforming state-of-the-art methods on the demanding ImageNet benchmark for diffusion models.

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

Paper: https://t.co/Pz51lY1Unp
Code: https://t.co/J7HMHOA5ZS

Our report: https://t.co/IKozelW5xx

📬 #PapersAccepted by Jiqizhixin

133

11K

absorbguo

@absorbguo

Last Seen Users on Sotwe

Trends for you

Most Popular Users