whthomas

Uncle Bob 这个数据很诚实。用 Agent 生产力确实大幅提升，但 30%-40% 花在了 tuning agents 和 hardening 上。很多人觉得这是浪费，我觉得恰恰相反。AI Agent 的真正价值不是让你少写代码，而是让你有余力去做以前根本做不完的质量工程。他说 surrounding tests 比以前好了不止一个档次，甚至比手动 TDD 还要稳。这就是关键所在。以前重构架构你得先掂量测试够不够，现在你能一边大刀阔斧地改，一边让 robust tests 继续跑。这 30%-40% 不是 overhead，是你以前根本买不起的基建预算。就像从手动挡换到自动挡，看起来油耗高了，但你敢走的路完全不一样了。

whthomas_unit retweeted

Uncle Bob Martin

@unclebobmartin

14 days ago

I am absolutely more productive using agents. I don't know the factor but it's large. However much of that productivity is spent tuning the agents and hardening the product. I'm guessing 30%-40%. Some might consider that a waste; but I don't. The software I'm creating nowadays is vastly more robust than I'd ever been able to create manually. I don't mean that the code is better. I mean the surrounding tests are vastly better. I have a higher degree of confidence than I ever had manually -- even when I used very disciplined TDD and Acceptance testing. And then there's the ability to quickly reorganize the modules and the architecture while keeping those robust tests running. That is a tremendous boon.

119

187

780

302K

whthomas_unit retweeted

jietang

@jietang

24 days ago

Recent thoughts: The Shift to Long-Horizon Tasks The most likely breakthrough this year will be in long-horizon tasks. We are moving toward a stage where Large Language Models (LLMs) learn to complete extended, complex missions by interacting with Agent environments. This is perhaps where the true value of LLMs lies. Take cybersecurity as an example: imagine a model that continuously hunts for software bugs and vulnerabilities. While it sounds like a search process, it’s actually the model learning the high-level intuition and methodology of a professional hacker. Unlike humans, AI can run 24/7 without fatigue. It could potentially find exploits at a much higher frequwill ency and claim bounties on platforms like HackerOne or BugCrowd. It sounds fun, but fundamentally, it's a revolution that displaces the hacker. If even hackers are being "disrupted," one can only imagine the impact on general programmers. From One-Person to None-Person Companies Building on long-horizon capabilities, Autonomous Agent Systems (AAS) will inevitably become the next frontier. Last year, we were discussing the rise of the "One Person Company" (OPC). I didn't expect us to move so quickly toward the "None Person Company" (NPC). It’s an ironic twist—we might all end up as NPCs in this new ecosystem. Engineering the Impossible: Memory and Learning To realize the vision above, we must solve three technical pillars: Memory, Continual Learning, and Self-Judging. I used to think these would require massive paradigm shifts and years of research. However, the pressure from both the technical and application sides is so intense that we are seeing these capabilities emerge through ingenious engineering "tricks": Memory: Long context windows (1M+) and RAG have significantly bridged the gap. Continual Learning: While true continual learning remains difficult, the release cycles are shrinking. Global models are updated monthly; domestic models are catching up. If we reach weekly updates by next year, it will effectively function as continual learning. Self-Judging: This remains the most elusive, yet models like Opus 4.7 are already demonstrating early self-correction and judgment capabilities. The Self-Evolving Endgame The most difficult—and most promising—path is Self-Evolution. The current wave is incredibly fierce. I suspect that models like Claude may have already achieved a baseline for self-training: writing their own code, cleaning their own data, generating synthetic data, and then training on it. It might "waste" some compute, but it saves the most precious resources: human labor and time. In the LLM era, speed is everything. Rapid iteration is what creates the cognitive gap between leaders and followers. Claude’s rumored 2-million-chip cluster for next year is likely dedicated to exactly this: autonomous model self-training. Technical Summary: 1M Context: Necessary baseline. Memory & Continual Learning: Prerequisites, likely solved first via "tricky" engineering. Harnessing Environments: The breakthrough point. Self-Judging: The tipping point. Full Self-Training: The endgame. Redefining AGI and the Industry If this is the road to AGI, then AGI’s definition should be the sum of all human collective intelligence, not just an individual’s intelligence. It must possess the creative capacity to produce something as profound as the "Theory of Relativity"—meeting the bar set by Hassabis. During this transition, every APP will need to be reconstructed as AI-native. In fact, we might move past the concept of APPs entirely. The most significant challenge will be the reconstruction of the operating system itself. In the future, you won’t see a traditional desktop; you will see an LLM OS, where applications are "generated on demand." This challenges the 80-year-old Von Neumann architecture and represents a total upheaval of the computer science industry. The Irreversible Wave From completing long-horizon tasks to fully autonomous operations, every sector—Security, Finance, Law, E-commerce—will be reshaped. Many friends have reached out lately, asking how to transform their enterprises to keep pace with AI. But few truly realize that this irreversible process has already begun. As this massive technical wave hits, we must be prepared to act, but we must also start thinking seriously about how to regulate it.

724

148

523

188K

Who to follow

Making audio knowledge searchable https://t.co/gs8FLSu5hM 🎙️🔮

whthomas

@whthomas_unit

about 2 months ago

这几个月我都在用kilocode写代码，我觉得比Cline好用很多，尤其是debug模式，可以有效地修正很多Bug. 但是这次V7更新之后我感觉相当不好用, 所有的动作都更加黑箱了，如果所有的团队都想把产品做成Claude Code的样子，我为什么不直接用Claude Code? https://t.co/38n1Vl2RHx

whthomas

@whthomas_unit

about 2 months ago

@ixiaowenz 赞同，不稳定发生的时候，太消耗使用的心智了。

whthomas

@whthomas_unit

about 2 months ago

AI的发展太快了，刚刚装完龙虾，就过季了，现在爱马仕来了。软件很快会从标品走向快消品。一时半会儿看不到新的机会在哪里。

whthomas_unit retweeted

Guillermo Rauch

@rauchg

about 2 months ago

Today we're open sourcing https://t.co/p76KVdY7dG, a reference platform for cloud coding agents. You've heard that companies like Stripe (Minions), Ramp (Inspect), Spotify (Honk), Block (Goose), and others are building their own "AI software factories". Why? 1️⃣ On a technical level, off-the-shelf coding agents don't perform well with huge monorepos, don't have your institutional knowledge, integrations, and custom workflows. 2️⃣ On a business level, the moat of software companies will shift from 'the code they wrote', to the 'means of production' of that code. The alpha is in your factory. Open Agents deploys to our agentic infrastructure: Fluid for running the agent's brain, Workflow for its long-running durability, Sandbox for secure code execution, AI Gateway for multi-model tokens. (Because of our focus on Open SDKs and runtimes, this codebase is a gem even if you're not hosting on Vercel.) TL;DR: if you're building an internal or user-facing agentic coding platform, deploy this: https://t.co/xdsc42nbDN

rauchg's tweet photo. Today we're open sourcing https://t.co/p76KVdY7dG, a reference platform for cloud coding agents.

You've heard that companies like Stripe (Minions), Ramp (Inspect), Spotify (Honk), Block (Goose), and others are building their own "AI software factories". Why?

1️⃣ On a technical level, off-the-shelf coding agents don't perform well with huge monorepos, don't have your institutional knowledge, integrations, and custom workflows.

2️⃣ On a business level, the moat of software companies will shift from 'the code they wrote', to the 'means of production' of that code. The alpha is in your factory.

Open Agents deploys to our agentic infrastructure: Fluid for running the agent's brain, Workflow for its long-running durability, Sandbox for secure code execution, AI Gateway for multi-model tokens.

(Because of our focus on Open SDKs and runtimes, this codebase is a gem even if you're not hosting on Vercel.)

TL;DR: if you're building an internal or user-facing agentic coding platform, deploy this:
https://t.co/xdsc42nbDN

162

353

678K

whthomas

@whthomas_unit

about 2 months ago

@alexatallah OpenSandbox！

whthomas

@whthomas_unit

about 2 months ago

openrouter 开始做起了 opensandbox 的生意。

Alex Atallah

@alexatallah

about 2 months ago

Easiest way to deploy openclaw and other agents on the cloud of your choice: https://t.co/WfeIdfOxa5 All agents work with all models right out of the box! It’s in beta. Lets us know if you have feedback:

132

11K

whthomas

@whthomas_unit

about 2 months ago

这一晚上 AI 神仙干架： - anthropic 宣布超级模型发布，不会对公众开放 - 智谱开源最新模型 glm5.1 - deepseek 更新了网站和应用，看起来要发新的模型了

130

whthomas

@whthomas_unit

2 months ago

Agent Board！

Cline

@cline

2 months ago

Introducing Cline Kanban: A standalone app for CLI-agnostic multi-agent orchestration. Claude and Codex compatible. npm i -g cline Tasks run in worktrees, click to review diffs, & link cards together to create dependency chains that complete large amounts of work autonomously.

232

378

whthomas

@whthomas_unit

3 months ago

我想到一个有趣的类比：Claude Code 其实和浏览器非常相似。 - 过去，人们通过浏览器上网，在网页时代：逻辑写在 HTML 里，由人工预先定义好交互流程，再通过调用 API 来完成操作。 - 未来，人们将通过 Agent 上网。Agent 时代：逻辑写在 `SKILL.md`（或未来其他形式的文件）里，由 Agent 自主理解并调用 API 来完成任务。简单来说，浏览器是人与网络之间的"渲染层"，而 Agent 则是人与网络之间的"执行层"——从人来驱动流程，演变为Agent 来驱动流程。

whthomas

@whthomas_unit

3 months ago

震惊。

Shanaka Anslem Perera ⚡

@shanaka86

3 months ago

BREAKING: Bahrain just got hit. Not a base. Not a military installation. The refinery. The Bahrain Petroleum Company, BAPCO, is burning. Iranian missiles and drones reached the facility on March 5 despite Bahraini air defenses intercepting 75 missiles and 123 drones in the same wave. The intercept count is the highest single-day figure for any Gulf state in this war. The fires are confirmed by video. The fires are real. Understand what BAPCO is. It is not an abstraction. It is the refinery that processes virtually all of Bahrain’s domestic petroleum output, sitting on an island of 800 square kilometers that also hosts Naval Support Activity Bahrain, the headquarters of the United States Fifth Fleet. The Fifth Fleet commands all US naval operations across the Persian Gulf, the Red Sea, the Arabian Sea, and the Indian Ocean. The commanding officer of the Fifth Fleet wakes up every morning approximately twelve kilometers from the refinery that Iran just struck. If you wanted to design a single target that communicates simultaneously to the global energy market, the US Navy, and every Gulf monarchy watching this war, you would design BAPCO. Iran did not need to destroy the facility to win the targeting decision. The mechanism of the strike is verification cost inversion applied to oil infrastructure rather than shipping insurance. An oil refinery that has been struck once by Iranian missiles in an active war is a refinery that no insurance underwriter, no shipping counterparty, and no downstream buyer can treat as a reliable facility without repricing every contract that touches it. The fires that Al Jazeera is broadcasting tonight are doing more work in the oil derivatives market than in the actual refinery. The facility will be repaired. The actuarial fact of its vulnerability cannot be unrepaired. Bahrain has absorbed attacks since February 28. The kingdom hosts the only US naval headquarters in the Gulf theater. It is a country of 1.5 million people on an island that cannot be defended in depth because it has no depth. Its air defense systems are among the most capable in the region and they intercepted nearly 200 incoming projectiles in a single day. Six of those projectiles reached the refinery anyway. The interceptors cost orders of magnitude more per round than the drones they are stopping. The attrition arithmetic that has governed every Gulf state’s position in this war has now been demonstrated at the petroleum infrastructure level. Iran does not need to shut BAPCO down. It needs to establish that BAPCO can be reached whenever Iran chooses to reach it. That threshold was crossed on March 5. Bahrain now knows it. The Fifth Fleet headquarters twelve kilometers away knows it. Every energy trading desk that has been pricing Gulf risk since February 28 is repricing it again tonight. The fires at BAPCO are the visible part. The invisible part is the number that moved in the oil options market in the thirty minutes after the first video was posted. https://t.co/ULBgEzZ3A8

604

11K

whthomas

@whthomas_unit

3 months ago

Sandbox大战这不就是开干了嘛。

Docker

@Docker

3 months ago

Running OpenClaw locally? Do it safely. This walkthrough shows how to run it inside Docker Sandboxes with Docker Model Runner: - Isolated microVM - No exposed API keys - Controlled network access - Fully private, local AI setup Secure agent workflows in ~2 commands. Read → https://t.co/RZh2qp7eSi

406

417K

whthomas

@whthomas_unit

4 months ago

Openrouter看起来挂了。 https://t.co/xWbP7B6ByO

124

whthomas

@whthomas_unit

4 months ago

@theHazarika @openrouter It seems I'm not the only one who noticed something was wrong

whthomas

@whthomas_unit

4 months ago

GLM兑换一个月的龙虾需要整整点31次？？？

whthomas

@whthomas_unit

5 months ago

为什么Claude这几天的速度这么慢？

whthomas_unit retweeted

Gabi Wall

@gabi_wall6

5 months ago

LONDON: A 15-year-old boy stunned a London courtroom after refusing to live with every family member the judge suggested claiming his parents, aunt, and grandparents had all beaten him. With no relatives left to place him with, the judge asked who he wanted custody from. The boy calmly replied: “Chelsea FC. They can’t beat anyone.” After checking legal guidelines, the judge granted temporary custody to the team. That’s it, ladies and gentlemen.

gabi_wall6's tweet photo. LONDON: A 15-year-old boy stunned a London courtroom after refusing to live with every family member the judge suggested claiming his parents, aunt, and grandparents had all beaten him.

With no relatives left to place him with, the judge asked who he wanted custody from.

The boy calmly replied:
“Chelsea FC. They can’t beat anyone.”

After checking legal guidelines, the judge granted temporary custody to the team.

That’s it, ladies and gentlemen.

186K

17K

20K

21M

whthomas

@whthomas_unit

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users