I am absolutely more productive using agents. I don't know the factor but it's large. However much of that productivity is spent tuning the agents and hardening the product. I'm guessing 30%-40%.
Some might consider that a waste; but I don't. The software I'm creating nowadays is vastly more robust than I'd ever been able to create manually.
I don't mean that the code is better. I mean the surrounding tests are vastly better. I have a higher degree of confidence than I ever had manually -- even when I used very disciplined TDD and Acceptance testing.
And then there's the ability to quickly reorganize the modules and the architecture while keeping those robust tests running. That is a tremendous boon.
Recent thoughts:
The Shift to Long-Horizon Tasks
The most likely breakthrough this year will be in long-horizon tasks. We are moving toward a stage where Large Language Models (LLMs) learn to complete extended, complex missions by interacting with Agent environments. This is perhaps where the true value of LLMs lies. Take cybersecurity as an example: imagine a model that continuously hunts for software bugs and vulnerabilities. While it sounds like a search process, it’s actually the model learning the high-level intuition and methodology of a professional hacker. Unlike humans, AI can run 24/7 without fatigue. It could potentially find exploits at a much higher frequwill ency and claim bounties on platforms like HackerOne or BugCrowd. It sounds fun, but fundamentally, it's a revolution that displaces the hacker. If even hackers are being "disrupted," one can only imagine the impact on general programmers.
From One-Person to None-Person Companies
Building on long-horizon capabilities, Autonomous Agent Systems (AAS) will inevitably become the next frontier. Last year, we were discussing the rise of the "One Person Company" (OPC). I didn't expect us to move so quickly toward the "None Person Company" (NPC). It’s an ironic twist—we might all end up as NPCs in this new ecosystem.
Engineering the Impossible: Memory and Learning
To realize the vision above, we must solve three technical pillars: Memory, Continual Learning, and Self-Judging.
I used to think these would require massive paradigm shifts and years of research. However, the pressure from both the technical and application sides is so intense that we are seeing these capabilities emerge through ingenious engineering "tricks":
Memory: Long context windows (1M+) and RAG have significantly bridged the gap.
Continual Learning: While true continual learning remains difficult, the release cycles are shrinking. Global models are updated monthly; domestic models are catching up. If we reach weekly updates by next year, it will effectively function as continual learning.
Self-Judging: This remains the most elusive, yet models like Opus 4.7 are already demonstrating early self-correction and judgment capabilities.
The Self-Evolving Endgame
The most difficult—and most promising—path is Self-Evolution. The current wave is incredibly fierce. I suspect that models like Claude may have already achieved a baseline for self-training: writing their own code, cleaning their own data, generating synthetic data, and then training on it. It might "waste" some compute, but it saves the most precious resources: human labor and time. In the LLM era, speed is everything. Rapid iteration is what creates the cognitive gap between leaders and followers. Claude’s rumored 2-million-chip cluster for next year is likely dedicated to exactly this: autonomous model self-training.
Technical Summary:
1M Context: Necessary baseline.
Memory & Continual Learning: Prerequisites, likely solved first via "tricky" engineering.
Harnessing Environments: The breakthrough point.
Self-Judging: The tipping point.
Full Self-Training: The endgame.
Redefining AGI and the Industry
If this is the road to AGI, then AGI’s definition should be the sum of all human collective intelligence, not just an individual’s intelligence. It must possess the creative capacity to produce something as profound as the "Theory of Relativity"—meeting the bar set by Hassabis.
During this transition, every APP will need to be reconstructed as AI-native. In fact, we might move past the concept of APPs entirely. The most significant challenge will be the reconstruction of the operating system itself. In the future, you won’t see a traditional desktop; you will see an LLM OS, where applications are "generated on demand." This challenges the 80-year-old Von Neumann architecture and represents a total upheaval of the computer science industry.
The Irreversible Wave
From completing long-horizon tasks to fully autonomous operations, every sector—Security, Finance, Law, E-commerce—will be reshaped. Many friends have reached out lately, asking how to transform their enterprises to keep pace with AI. But few truly realize that this irreversible process has already begun. As this massive technical wave hits, we must be prepared to act, but we must also start thinking seriously about how to regulate it.
Today we're open sourcing https://t.co/p76KVdY7dG, a reference platform for cloud coding agents.
You've heard that companies like Stripe (Minions), Ramp (Inspect), Spotify (Honk), Block (Goose), and others are building their own "AI software factories". Why?
1️⃣ On a technical level, off-the-shelf coding agents don't perform well with huge monorepos, don't have your institutional knowledge, integrations, and custom workflows.
2️⃣ On a business level, the moat of software companies will shift from 'the code they wrote', to the 'means of production' of that code. The alpha is in your factory.
Open Agents deploys to our agentic infrastructure: Fluid for running the agent's brain, Workflow for its long-running durability, Sandbox for secure code execution, AI Gateway for multi-model tokens.
(Because of our focus on Open SDKs and runtimes, this codebase is a gem even if you're not hosting on Vercel.)
TL;DR: if you're building an internal or user-facing agentic coding platform, deploy this:
https://t.co/xdsc42nbDN
Easiest way to deploy openclaw and other agents on the cloud of your choice: https://t.co/WfeIdfOxa5
All agents work with all models right out of the box!
It’s in beta. Lets us know if you have feedback:
Introducing Cline Kanban: A standalone app for CLI-agnostic multi-agent orchestration. Claude and Codex compatible.
npm i -g cline
Tasks run in worktrees, click to review diffs, & link cards together to create dependency chains that complete large amounts of work autonomously.
BREAKING: Bahrain just got hit. Not a base. Not a military installation. The refinery.
The Bahrain Petroleum Company, BAPCO, is burning. Iranian missiles and drones reached the facility on March 5 despite Bahraini air defenses intercepting 75 missiles and 123 drones in the same wave. The intercept count is the highest single-day figure for any Gulf state in this war. The fires are confirmed by video. The fires are real.
Understand what BAPCO is. It is not an abstraction. It is the refinery that processes virtually all of Bahrain’s domestic petroleum output, sitting on an island of 800 square kilometers that also hosts Naval Support Activity Bahrain, the headquarters of the United States Fifth Fleet. The Fifth Fleet commands all US naval operations across the Persian Gulf, the Red Sea, the Arabian Sea, and the Indian Ocean. The commanding officer of the Fifth Fleet wakes up every morning approximately twelve kilometers from the refinery that Iran just struck. If you wanted to design a single target that communicates simultaneously to the global energy market, the US Navy, and every Gulf monarchy watching this war, you would design BAPCO.
Iran did not need to destroy the facility to win the targeting decision. The mechanism of the strike is verification cost inversion applied to oil infrastructure rather than shipping insurance. An oil refinery that has been struck once by Iranian missiles in an active war is a refinery that no insurance underwriter, no shipping counterparty, and no downstream buyer can treat as a reliable facility without repricing every contract that touches it. The fires that Al Jazeera is broadcasting tonight are doing more work in the oil derivatives market than in the actual refinery. The facility will be repaired. The actuarial fact of its vulnerability cannot be unrepaired.
Bahrain has absorbed attacks since February 28. The kingdom hosts the only US naval headquarters in the Gulf theater. It is a country of 1.5 million people on an island that cannot be defended in depth because it has no depth. Its air defense systems are among the most capable in the region and they intercepted nearly 200 incoming projectiles in a single day. Six of those projectiles reached the refinery anyway. The interceptors cost orders of magnitude more per round than the drones they are stopping.
The attrition arithmetic that has governed every Gulf state’s position in this war has now been demonstrated at the petroleum infrastructure level. Iran does not need to shut BAPCO down. It needs to establish that BAPCO can be reached whenever Iran chooses to reach it. That threshold was crossed on March 5. Bahrain now knows it. The Fifth Fleet headquarters twelve kilometers away knows it. Every energy trading desk that has been pricing Gulf risk since February 28 is repricing it again tonight.
The fires at BAPCO are the visible part. The invisible part is the number that moved in the oil options market in the thirty minutes after the first video was posted.
https://t.co/ULBgEzZ3A8
Running OpenClaw locally? Do it safely.
This walkthrough shows how to run it inside Docker Sandboxes with Docker Model Runner:
- Isolated microVM
- No exposed API keys
- Controlled network access
- Fully private, local AI setup
Secure agent workflows in ~2 commands.
Read → https://t.co/RZh2qp7eSi
LONDON: A 15-year-old boy stunned a London courtroom after refusing to live with every family member the judge suggested claiming his parents, aunt, and grandparents had all beaten him.
With no relatives left to place him with, the judge asked who he wanted custody from.
The boy calmly replied:
“Chelsea FC. They can’t beat anyone.”
After checking legal guidelines, the judge granted temporary custody to the team.
That’s it, ladies and gentlemen.