CEOs are uniquely prone to AI psychosis because they’re sufficiently distant from the last mile of work that still has to happen to generate most value with AI.
So when they play with AI, they see the happy path results, often not considering the next 10 or 20 things that have to happen to get sustainable results from agents.
“Look I made this awesome product prototype”. Yes but you didn’t have to review the code before it went into production and fix a bunch of issues.
“Look I generated a contract”. Yes but you didn’t verify all the terms before it goes out to the counterparty and didn’t have to wire up all the past contracts to work with.
The best thing you can do as a CEO is to use AI a *ton* to figure out the real implications of agents in the enterprise, and come out the other side with an appreciation for both the upside and the real work that goes into them.
@housecor LLM doesn’t have that kind of understanding about itself, that it could give any meaningful answer about what is good context. You would need to test it and compare results.
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out.
I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really).
It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely.
The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture.
We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying.
I worry.
It may turn out to be interesting if generating production quality software is genuinely an annoying and hard problem (low~high read, extra high write task - "errors" compound on writes - "tending a garden" problem), whereas AI research and hacking are extremely easy problems for the AI (extra high read, low write task).
Rather than decomposing what AI is good at by verticals or diffusion, thinking about problems as search / context distillation on demand vs write action space (both per turn or # of turns) may be a better framing.
Any task that is high read, low write will likely be solved very very soon, but many many real world tasks are high write tasks and framing them will be non-trivial (and frankly lot more low hanging fruits with the extra high read and low write tasks, like cyber).
In that sense, somehow, engineers and property managers may have a longer time to automation than AI researchers and dry lab academia
Whether it’s existing consulting firms, new ones that emerge, FDEs from agent vendors, or new internal agent engineering roles, the amount of work that is going to be created to implement agents in enterprises will exceed anything we imagine today.
The complexity of implementing agents in any existing organizations is very real. When I talk to large enterprises, as you move from a chat paradigm to agents that participate in meaningful workflows, there are a number of things they need to do.
First, you have to get agents to be able to talk to your data securely across your systems. In many cases, enterprises have decades of legacy infrastructure that contain the valuable context for AI agents. That’s going to take a ton of work to go modernize and move to systems that work well with agents.
Then, you need to ensure that you’ve implemented agents with the right access controls and entitlements, the right scopes to be safely used, and have ways of monitoring, logging, and securing the work that they do.
Next, you need to actually document the processes in the organization in a way that agents can utilize for doing the work. You also need to figure out what the new workflow looks like when agents and people are working together on a process, and who steps in where. Just replicating the old workflow will mute the gains. Oh and you likely need to create evals for your top new end-state processes.
Finally, you have to keep up with a rapidly changing set of best practices and architectural shifts happening in the agent space. While it’s fun for people to change their personal productivity tools on a dime, it’s 100X harder to do this in a business process. The speed of change is a blessing and a curse right now for anyone trying to keep a stable system design.
All of this means that individuals and companies that develop expertise on the above set of components (and more) are going to be needed to help organizations actually implement agents at scale. This is also the rationale for vertical AI agents right now that can go in deep on a business domain and help bring automation to it.
This is a huge opportunity right now whether you’re doing this internally or as an external business provider.
i actually don't want this "but you don't review compiler output either" meme to die.
it's the perfect signal for being immediately able to ignore someone in this space.
AI agents don’t feel the pain that humans do. 3 takeaways from Mario Zechner(@badlogicgames), creator of Pi - the minimalist, self-modifying agent that OpenClaw is built on:
#1 - Automation bias: agents quickly wow us and win false trust:
“There are moments of brilliance in agents where they spit out perfectly fine simple code.
As the steering engineer you can look at that and think: ‘Wow, this is amazing. I can just sit back and not care because it's doing the thing how I would do it.’
Two minutes later you have another agent running that spits out the worst, horrible, garbage code, but you might not notice because you have fallen into automation bias and think your agent is doing the job well.”
#2 - Humans have a different capacity for learning than agents:
“Agents don't learn. You can put as much stuff in your AGENTS.md, build a memory system, but it's not the same type of learning that a human does. Humans are failable as well, but they have some capability of learning.”
#3 - Feeling pain is inherently human and drives us to make things well:
“Humans feel pain. I think that's one of the defining things about humans. If the pain gets too big, you as a human are incentivized to fix the cause of your pain.
In a code base, the cause is usually terrible interfaces and terrible complexity that you want to get rid of because you can no longer maintain that system.”
People of https://t.co/TgG5bkXUdV. pi has supported OpenAI WebSockets mode since February. However, we did not support delta updates.
pi v0.71.1 now also supports delta updates: only the latest context additions get send, leading to a nice 66% throughput increase.
Start pi, /settings -> transport -> websocket-cached
Here's SSE vs. cached WebSocket side by side.
This will also apply to your OpenClaw instance, should @steipete & team decide to update to the latest and greatest, giving you all the benefits of the pi runtime on top of the Codex server backend.
As agents become the biggest users of software, then all software has to be available in a headless fashion. Agents won’t be using your UI, they’ll be talking to your APIs.
So the question becomes what is the business model of software and this headless approach in the future?
Here are a few thoughts on how everything plays out based on what we’re seeing and doing at Box, but also conversation with other platforms.
1) Seats don’t go away for *people*. Seats are still a convenient and efficient way to have a customer use technology predictably for a set of users within a baseline set of usage. The key, though, is that when the customer pays for a seat, it has to come with a set of usage of APIs on behalf of that user that the agent can use on their behalf.
The user will need to be able to interact with their data and the underlying tool via any agent they work with, and an embedded amount of usage will come with the seat. I would imagine most software -Box included- will enable seats to work with their data at a relatively high volume via systems like ChatGPT, Codex, Claude, Gemini, Cursor, Copilot, Perplexity, Factory, Cogniton, et al. quite seamlessly. If you don’t do this, you’re DOA.
2) Agents may have “seats” if they are doing stateful work in the system, but they will be priced very differently than people. Seats (or the equivalent) can make sense when you have an agent that has its own workspace, stores its own data, needs a different set of permissions compared to the user, and so on.
If a company wants this agent to be around for long period of time, that may very well look like another “user” in the system. Openclaw-style agents highlight what this future could look like.
The only issue on pricing here is that one customer could decide to do all their work in 1 agent, and another might split it into 1,000 agents. So pricing like a human seat is nearly impossible and impractical; each company will have a different approach for this as it gets tricky perfectly trying to capture all the value within an agent seat.
3) The dominant pricing for headless use that goes above the seat allotment, or when an agent is firmly acting on their own, will be a consumption model. Many enterprises software platforms have previously operated like this with PaaS options, and agents will look like another machine user of their system.
In some cases the APIs might get priced just as they did previously, but in other cases there may need to be new types of APIs that represent the work an agent would do in one go -more akin to an outcome- instead of a series of API calls. This is especially germane when the headless software also has an agentic use-case embedded within in, such as orchestrating the process within their own system via AI.
Overall the growth of this usage pattern is effectively unbounded as the use-cases for agents operating on data in these systems will dramatically exceed what people do with their data and tools today. Every platform that goes headless (which will be anyone that wants to take advantage of agents) will need to adopt a model like this. Some may fight it initially but it’s an inevitably as there will always be more and more agents outside your platform than people.
Overall, there’s a lot of really interesting changes left to come in software due to headless use of these systems. Early days.
A common dynamic I observe with AI: it feels most impressive when you don’t know much about the subject, don’t care or don’t have a clear idea of what the you want.
This applies across design, code, legal, and more. If I don’t know code very well, every piece of code it writes feels very impressive.
Once you know what something should feel or look like, it becomes almost impossible to guide AI there. And you definitely can’t one-shot it.
this is largely true for the vast majority of economically useful tasks agents do today
start with basic primitives, let the agent rip, use traces to examine runs and only add when it fails, add stuff to make it reliably pass like instructions, hooks, tools
some basic primitives:
- filesystem for context management
- bash
bash + skill learning is often enough for agents to fix their mistakes
but this is approach isn’t as reliable for long-horizon work (see frontier-swe)
there it’s ok to build a good harness there to squash the issues with instructions, hooks, more harness stuff
no shame in building a system that solves your problem, we don’t have to be bitter lesson pilled 24/7, just add the hook man :)
if new models dissolve those pieces then great less stuff to manage, if they don’t then also great because you have a system to do your task well
The jump from working with a chatbot to having an agent that actually helps automate a process requires a real amount of work.
Most companies will need to have dedicated people that are responsible for bringing automation to their teams, instead of leaving this up to every individual employee. Partly because the work is more technical than we imagine today, and partly because it’s just hard to do this as a side project.
The job spec is to map out new workflows with agents, implement new systems to deploy agents, make sure the agent has all the right (up to date) context to work with, wiring up internal systems to connect to the agents, creating evals for the agents, figuring out where the human is in the loop, managing the system when there are new upgrades, helping with the change management of the existing business process, and so on.
These jobs may come from IT or engineering, or live directly in the business function itself. They’ll be called different things depending on the company, and in some sense it’s the future of software engineering that you’ll see a huge growth of in non-tech companies.
Most companies will have to be hiring for this now or in the future, and it’s another example of the kind of new jobs that will be created in AI.
OVERRATED: running tons of agents in parallel; working on too many things at once; perpetual context-switching; opening lots of low-quality PRs that may never land.
UNDERRATED: using one or two agents at a time; focusing on the task in front of you; thinking deeply; finishing stuff; making your code works in prod.
It’s remarkable how often you need to be dramatically upgrading your AI architecture given the pace of progress in AI models right now.
If you’re building agents, you basically need to throw away large parts of previous work that you setup to compensate for model limitations every few quarters. The systems you built to mitigate context window limits aren’t useful anymore, and for many use-cases it’s easier just to throw more compute at a problem today in ways that wouldn’t have worked previously.
If you’re deploying agents in a workflow, you likely need to equally be rethinking your core systems at about that same frequency. The way you would deploy agents in an enterprise 18 months ago is entirely different from the best practices that you’d have today.
This is partly why everyone’s working so hard right now. Right as a best practice is solidified, models improve dramatically, and that old work is rendered obsolete. Unclear that this lets up anytime soon, which is why the it pays to be so wired in right now.
Since we open-sourced pi-autoresearch, @Shopify teams have been running it on everything.
Results so far:
Unit tests: 300x faster
React component mounting: 20% faster
CI build time: 65% reduction
Made pnpm run faster
Autoresearch never stops trying things you'd never have time to try.
Repo: https://t.co/473UFWKanV
3 months ago I started building a coding agent that runs in the cloud.
It's since written every line of code I've shipped, including itself.
Today, I'm open sourcing it. Introducing Open Agents.
Most tech companies break out product management and product marketing into two separate roles: Product management defines the product and gets it built. Product marketing wires the messaging- the facts you want to communicate to customers- and gets the product sold. But from my experience that's a grievous mistake. Those are, and should aways be, one job.
There should be no separation between what the product will be and how it will be explained- the story has to be utterly cohesive from the beginning. Your messaging is your product. The story you're telling shapes the thing you're making.
I learned story telling from Steve Jobs. I learned product management from Greg Joswiak. Joz, a fellow Wolverine, Michigander, and overall great person, has been at Apple since he left Ann Arbor in 1986 and has run product marketing for decades. And his superpower- the superpower of every truly great product manager- is empathy. He doesn't just understand the customer. He becomes the customer.
So when Joz stepped into the world with his next-gen iPod to test it out, he fiddled with it like a beginner. He set aside all the tech specs- except one: battery life.
The numbers were empty without customers, the facts meaningless without context.
And, that's why product management has to own the messaging. The spec shows the features, the details of how a product will work, but the messaging predicts people's concerns and finds way to mitigate them.
- #BUILD Chapter 5.5 The Point of PMs