CONAN AT HARVARD: “No university in our nation has produced more Nobel laureates or white collar criminals… so whether you choose good or evil, know that you are among the very best.”
I am returning from a healthcare leadership conference where this
"gap" was clear, and executives captive to their #Copilot experience, and don't know how to access #Codex.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
Sierra’s Ghostwriter suggests the UI for customer service is becoming an agent, not an app. The real battle is not model quality; it is who becomes the trusted action layer on top of CRM, NOW, and legacy workflow.
#AgenticAI#CRM#ServiceNow#Salesforce#EnterpriseAI
Today, Sierra is releasing Ghostwriter, our agent for building agents. With Ghostwriter, you can create an AI agent for your customer experience — one that can chat, pick up the phone, speak dozens of languages, take action on your systems of record, and be protected with industry-leading guardrails — simply by having a conversation. No clicking, no forms, no menus.
Codex and Claude Code have transformed how we build software, making it possible for software engineers to orchestrate and review the work rather than doing all the work themselves. We think the same transformation will happen for all software. Rather than every enterprise app having a web app for humans and an API for automation, every software platform’s UI will be an agent that can do the work on your behalf.
I recorded a demo of my building and optimizing an agent with Ghostwriter so you can see how powerful and easy it is to use. It’s completely changed the way our early adopters build agents, and it’s changed the way I think about the software industry. Let me know what you think, and, if you’re interested in trying it out at your business, please reach out directly.
AI is your ghostwriter, but you are the author.
I was speaking with my friend Arya about the complex dynamics software teams face now that most code is being generated by AI agents. There are very few natural bottlenecks to code “slop” - features over quality, and functionality over polish.
I have made a point of banning speaking about AI as the author of code. Codex/Claude didn’t write that code, just like your table saw didn’t cut the wood - you did, with the help of AI. If we start speaking about AI that way, we remove the accountability of the engineer for the quality of the system they are producing.
Arya put it succinctly in a way that I hadn’t heard before: AI is your ghostwriter, but you are the author. It’s your name on the cover of the book.
I think the next year of software engineering will be building out the tools that enable all of us to be proud to sign our name to and take accountability for the increasingly complex systems we produce with our agents.
Allow me to restate my beliefs that Denver is one of my favorite cities, the Avs are a top team, and their uniforms are the ugliest in hockey. Horrible colors, silly shoulder stripes, stupid numbers, and a logo that looks like a sign for a slushy machine.
There. That should do it
A powerful new short strategy is emerging.. and it revolves around RPOs.
Remaining Performance Obligations. They were once a sign of strength. Now? They’re a liability.
An RPO is essentially a promise to deliver work in the future, which inadvertently gives low-end AI disruptors 24–36 months to create a cheaper, faster alternative.
Your future revenue becomes a target for someone else’s model. If AI can deplete your RPO before you even realize it, your valuation multiple is an illusion.
Long-term contracts and obligations are now multi-year windows for AI to replace you.
Analysis & Examples $CRM $ORCL $WDAY $NOW:
We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
It *kind of* works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
@logangraham is a good sport. This is truly outstanding video by @JoannaStern and staff at @WSJ I wonder if the blue team at @AnthropicAI thinks this was worth it.
AI ran our office vending machine for several weeks. It lost hundreds of dollars, gave away a PlayStation, bought a live fish—and taught us a lot about AI agents, writes @JoannaStern. https://t.co/V6NrbyQzy6
I would like to clarify a few things.
First, the obvious one: we do not have or want government guarantees for OpenAI datacenters. We believe that governments should not pick winners or losers, and that taxpayers should not bail out companies that make bad business decisions or otherwise lose in the market. If one company fails, other companies will do good work.
What we do think might make sense is governments building (and owning) their own AI infrastructure, but then the upside of that should flow to the government as well. We can imagine a world where governments decide to offtake a lot of computing power and get to decide how to use it, and it may make sense to provide lower cost of capital to do so. Building a strategic national reserve of computing power makes a lot of sense. But this should be for the government’s benefit, not the benefit of private companies.
The one area where we have discussed loan guarantees is as part of supporting the buildout of semiconductor fabs in the US, where we and other companies have responded to the government’s call and where we would be happy to help (though we did not formally apply). The basic idea there has been ensuring that the sourcing of the chip supply chain is as American as possible in order to bring jobs and industrialization back to the US, and to enhance the strategic position of the US with an independent supply chain, for the benefit of all American companies. This is of course different from governments guaranteeing private-benefit datacenter buildouts.
There are at least 3 “questions behind the question” here that are understandably causing concern.
First, “How is OpenAI going to pay for all this infrastructure it is signing up for?” We expect to end this year above $20 billion in annualized revenue run rate and grow to hundreds of billion by 2030. We are looking at commitments of about $1.4 trillion over the next 8 years. Obviously this requires continued revenue growth, and each doubling is a lot of work! But we are feeling good about our prospects there; we are quite excited about our upcoming enterprise offering for example, and there are categories like new consumer devices and robotics that we also expect to be very significant. But there are also new categories we have a hard time putting specifics on like AI that can do scientific discovery, which we will touch on later.
We are also looking at ways to more directly sell compute capacity to other companies (and people); we are pretty sure the world is going to need a lot of “AI cloud”, and we are excited to offer this. We may also raise more equity or debt capital in the future.
But everything we currently see suggests that the world is going to need a great deal more computing power than what we are already planning for.
Second, “Is OpenAI trying to become too big to fail, and should the government pick winners and losers?” Our answer on this is an unequivocal no. If we screw up and can’t fix it, we should fail, and other companies will continue on doing good work and servicing customers. That’s how capitalism works and the ecosystem and economy would be fine. We plan to be a wildly successful company, but if we get it wrong, that’s on us.
Our CFO talked about government financing yesterday, and then later clarified her point underscoring that she could have phrased things more clearly. As mentioned above, we think that the US government should have a national strategy for its own AI infrastructure.
Tyler Cowen asked me a few weeks ago about the federal government becoming the insurer of last resort for AI, in the sense of risks (like nuclear power) not about overbuild. I said “I do think the government ends up as the insurer of last resort, but I think I mean that in a different way than you mean that, and I don’t expect them to actually be writing the policies in the way that maybe they do for nuclear”. Again, this was in a totally different context than datacenter buildout, and not about bailing out a company. What we were talking about is something going catastrophically wrong—say, a rogue actor using an AI to coordinate a large-scale cyberattack that disrupts critical infrastructure—and how intentional misuse of AI could cause harm at a scale that only the government could deal with. I do not think the government should be writing insurance policies for AI companies.
Third, “Why do you need to spend so much now, instead of growing more slowly?”. We are trying to build the infrastructure for a future economy powered by AI, and given everything we see on the horizon in our research program, this is the time to invest to be really scaling up our technology. Massive infrastructure projects take quite awhile to build, so we have to start now.
Based on the trends we are seeing of how people are using AI and how much of it they would like to use, we believe the risk to OpenAI of not having enough computing power is more significant and more likely than the risk of having too much. Even today, we and others have to rate limit our products and not offer new features and models because we face such a severe compute constraint.
In a world where AI can make important scientific breakthroughs but at the cost of tremendous amounts of computing power, we want to be ready to meet that moment. And we no longer think it’s in the distant future. Our mission requires us to do what we can to not wait many more years to apply AI to hard problems, like contributing to curing deadly diseases, and to bring the benefits of AGI to people as soon as possible.
Also, we want a world of abundant and cheap AI. We expect massive demand for this technology, and for it to improve people’s lives in many ways.
It is a great privilege to get to be in the arena, and to have the conviction to take a run at building infrastructure at such scale for something so important. This is the bet we are making, and given our vantage point, we feel good about it. But we of course could be wrong, and the market—not the government—will deal with it if we are.
Thanks @johnthornhillft for bringing this study by @DallasFed to top of mind again with this great quote in @FT: "It is rare for a central banking institution to model the economic impact of human extinction (spoiler alert: GDP goes to zero)."