I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
Today a crazy quantum story just got wilder.
On March 31, the Google Quantum AI team published a landmark result on Shor's algorithm for elliptic curve cryptography. Technically, the paper was a bombshell: a dramatic 10x improvement over the state-of-the-art. As a stunt and wakeup call to the blockchain space, those optimisations were illustrated on secp256k1, the elliptic curve underlying Bitcoin and Ethereum signatures.
But perhaps the most striking part of the paper was sociological, not technical. Instead of following standard academic process, the optimisations were kept secret, hidden behind a zero-knowledge (ZK) proof. Google's accompanying blog post mentions they "engaged with the U.S. government". The ZK proof demonstrates the existence of algorithmic improvements without leaking details. Academic censorship with ZK, a historic first!
As a co-author of the Google paper I witnessed some of the context surrounding this censorship. To be honest, multiple aspects of that context don't sit well with me. As much as I believe the general public ought to know more, I am limited in my ability to whistleblow. Though let me be clear about one thing: the Google team's professionalism has been absolutely exemplary, and they deserve nothing but praise.
Censorship has a way of backfiring. The Streisand effect, where an attempt to bury something only draws more attention to it, is exactly what's unfolding today. First, Google's key optimisation has been rediscovered by the French. And in a thrilling turn of events, a collaborative Shor-at-home challenge just launched. The initiative, available at ecdsa[.]fail, breached a new Shor world record in a matter of hours.
Let's start with the rediscovery. Just two months after Google's paper, French quantum expert Andrรฉ Schrottenloher cracks the main secret optimisation. His paper, titled "Optimized Point Addition Circuits for Elliptic Curve Discrete Logarithms", landed on the arXiv today. Big congrats to Andrรฉ, who beat several other nerdsnipped experts to it. In a blog post also published today, Craig Gidney, the world expert on Shor optimisations, revealed that he'd been sitting on this very optimisation for a whole year under censorship pressure.
Interestingly, Andrรฉ missed a handful of minor optimisations, both from Google's original publication and from improvements found since. It's plausible there's still plenty of juice left to squeeze out of Shor, and this is exactly what the ecdsa[.]fail challenge is about. The verifier program developed for the ZK proof does double duty, automatically filtering for valid submissions. Dozens of compounding small and micro improvements are rolling in. As of the time of writing there's an 8.4% improvement to Google's circuit, as measured by the product of logical qubit count and Toffoli gate count. Nice!
The nerdsnipping ran deeper than anyone expected. Over the last few weeks it became clear it extended well beyond Andrรฉ and other quantum experts. Behind the scenes, a small army of amateurs quietly got to work. Inspired by Karpathy-style autoresearch, they turned AI on Shor. Ironically, the verifier program for the ZK proof makes an ideal reward function for AIs. The barrier to entry for this modern style of research is refreshingly low, with several non-experts, even a teenager, finding nice optimisations. Get in touch if you'd like to join a Telegram group with fellow autoresearchers :)
Part 2: neutral atoms and qday
The story doesn't end with Google. On the same day Google went public, a stealthy startup called Oratomic published its own Shor paper in a coordinated release. It made a splash, ultimately becoming the most upvoted paper on scirate[.]com, a website ranking arXiv papers.
Oratomic's claim was wild. By building on Google's logical optimisations and applying custom physical optimisations for neutral atoms, they claimed just 10K physical qubits were sufficient to run Shor's algorithm on secp256k1. That number is mind-bogglingly low.
Knowing essentially nothing about neutral atoms when Oratomic's paper landed, I was intrigued and decided to learn more about the tech. I fell straight down the rabbit hole and spent a couple hundred hours on the topic. I got a little obsessed and watched every YouTube video I could find and spoke to a bunch of experts.
My conclusion? The tech is real, very real. Even Google recently decided to start a neutral atom lab, a notable pivot from their sole focus on superconducting qubits. If you care about qday, i.e. the day a quantum computer will break the first piece of cryptography in production, neutral atoms demand your attention. I shared some of my learnings on Shor and neutral atoms in a 30min talk at the ZKProof cryptography conference. You can find it on YouTube by searching "zkproof neutral atom".
Here's an interesting observation about this duo of breakthrough papers: neither Google nor Oratomic say a word about what their results mean for qday. No timelines. Zero. Nada. That is especially baffling given that the whole point of whitehat quantum cryptanalysis is to inform qday estimations and help the general public make good decisions.
So let me attempt to partially fill the silence, similarly to what Scott Aaronson did in his April 29 post. Given everything I know, including scary non-public information, I now put the odds of qday by 2032 at 50%. 10% by 2030.
Anecdotally, the US government has its own date: 2035. Originating at the NSA and later adopted by NIST, it's when branches of the US government will be disallowed from using quantum-vulnerable cryptography. In plain language: with hindsight, that date is a joke and should be discounted entirely. I don't see how NIST avoids being forced to pull it forward by years.
Part 3: post-quantum cryptography
There are good reasons to sound the alarm today, but please do not panic. Rushing carelessly towards immature post-quantum cryptography is a recipe for disaster. IMO a good target date for migration is 2029, roughly 3.5 years out. 2029 happens to be the date selected by Google, Cloudflare, and the Ethereum Foundation.
These days most of my time goes to safely migrating Ethereum towards post-quantum cryptography as part of the broader lean Ethereum effort. There's a lot to do. We need to rip out and replace BLS signatures at the consensus layer, KZG commitments at the data layer, and ECDSA signatures at the execution layer.
The plan to get there is compelling, and is based on hash-based cryptography. Within the Ethereum Foundation we've developed a Swiss army knife called leanVM (github[.]com/leanEthereum/leanVM) powered by the magic of hash-based SNARKs. Thanks to truly exceptional work by Emile, Thomas, and others, its performance is derisked. Regarding security, leanVM is a jewel, a minimal zkVM crafted for end-to-end formal verification and maximum security.
Want to help? There are two $1M initiatives. First, the Proximity Prize (proximityprize[.]org). Solve a long-standing mathematical conjecture in coding theory, improve hash-based SNARKs, and go home a millionaire. Second, the Poseidon Initiative (poseidon-initiative[.]info), offers $1M for breaking Poseidon, the SNARK-friendly hash function.
@GeoffreyHuntley absolutely agree - especially about the brick wall!
here's my formulation... reducing coupling factor 'k' flattens the complexity curve and defers the barrier - but i don't think you can completely eliminate it
https://t.co/cNwdmp4iui
safe travels
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
@rahulj51 I don't believe specs are code, neither should they be thrown away or archived. Specs can capture the intentions of the coder in a way that code can't: what must be true and why vs how it's done
Reminds me of the comments vs self-documenting code debate
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
@layer2maxi thanks for checking it out - not related to PG's essay, i honestly wasn't aware. superlinear in this case is a catch-all for all those terms: "grows faster than linear." it accomodates exponents from n^1.5 to n^3 or even raw state space s^n - depends on your architecture
@zeeg agreed - i think i've quantified it, but explaining it simply is the hard part
best i have: you add one thing and now you need to check four things
another, and it's 16 things, then 32, etc. until you can't keep up
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
@toddsaunders As an agency CTO, I understand the forces at work, but that is not ethical.
Besides, building the software is only step one. You have to release it to really know that it works.
Been thinking about this deeply recently: https://t.co/Adk3Ds66Xj
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
@nayshins Absolutely. I'm making a model for this I call the "Verification Complexity Barrier." Will watch your talk. Sounds very similar
https://t.co/Adk3Ds66Xj
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
Lots of thoughts on this wrt agentic coding. OP poses the real question at the end: who decides what "correct" means?
Formal verification is great and all, but who verifies the verifiers? Can we really imagine handing it all over to AI someday?
https://t.co/Gtptdbm8tg
The "size" of a coding task is not intrinsic to the task. It exists between tasks in the graph of their relationships. That is mostly hidden and changes over time.
This is why estimating software work is so hard, and why AI doesn't magically fix it.
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6
I am developing a formal theorem I call the Verification Complexity Barrier.
In a nutshell, if a program has some components `n` that have connectivity factor of `k > 0` , then verification complexity increases superlinearly for each new component. Therefore the time required to fully verify the system always exceeds time to generate components.
After a while, because it's superlinear, the verification complexity takes off and becomes impossible to keep up with in some finite amount of time. This was true before AI, but is much starker now as code generation time trends towards zero. You hit the barrier sooner.
We all have finite capacity - even AI agents - so there will always be a certain number of components `n` where the wall is hit. You must spend more and more effort on verification for each new component in the system.
The best thing you can do is spend time changing the "topology" of the problem - change the software architecture - so that the exponent of verification complexity is lowered ([1] Lehman) and the curve is flattened. You can bundle components into modules, you can add automated tests, you can use formal proofs, you can use type systems. These things push the barrier to the right. They buy you more components and a more complex system. But the theorem suggests you can only ever defer the barrier, never completely eliminate it.
AI Agents can burn tokens all day long generating software components and tests for those components. They can find bugs and fix them, but they cannot prove the absence of bugs (per [2] Dijkstra, [3] Rice, [4] Smith, and others). And "Bug" needs to be defined against someone's spec for what "working" and "not buggy" looks like. The more you build with Agents, the heavier the verification burden becomes.
This may sound like the same trite observation others have made here on X: "Our bottleneck is no longer writing code, but reviewing code" [5] and "I am the bottleneck now." [6] True, but I don't think anyone has captured the magnitude of the problem. The math is: at a certain component count `n`, it is literally impossible for you, your team, and your agents to completely verify a system. The bottleneck goes to zero, and nothing gets through.
So what software companies do in the real world is release incompletely verified software and massively scale up. This shifts the burden of verification onto their customers, because "given enough eyeballs, all bugs are shallow" ([7] Raymond). If you can get enough eyeballs, this is a very cost-effective way to shift the barrier to the right by massively increasing your team's capacity. You walk the tightrope of doing enough internal verification before release so you don't lose customers, while tolerating a certain amount of escaped bugs, which - if those bugs matter at all - your customers will find for you.
Meanwhile, massively scaling up just accepts the growing cost of complexity. You can push `n*` from 15 to 30 by quadrupling your capacity. To get to 60 you need to quadruple again, and then again to get to 120. Your cost curve is superlinear to get linear gains in system size. At a big enough scale, you amortize the cost across your customer base and the economics work.
Contrast that with a sufficiently complex vibecoded app built for a small audience - high complexity costs can't be amortized at small scale. I expect to see many people and companies try and fail at vibecloning complex SaaS in the near term. Complexity cost economics only scale with audience size (I will share another model for this).
I do think SaaS prices will be corrected downwards to account for savings in code generation, but I predict that once the irrational exuberance for vibing fades, we'll see that it still makes sense to buy rather than self-build complex SaaS.
A broader implication is that AI Agents will never be able to self-verify. Humans, too, will never be able to fully verify their behavior, because LLMs are by design of maximal complexity. Did you see the size of those error bars in the latest METR results? [8] The longer the horizon on a task, the more spread in AI agent outcomes. This is the Barrier in action. Spread is a feature of GenAI, but in practice it means heaps more output to review and verify.
The Complexity Barrier shows you literally won't have time to review it all. At the inflection point of verification complexity, you have to fall back on vibes. The implication for fast-takeoff AGI is even scarier: if AI does reach a point of recursive self-improvement, this theorem suggests it will be structurally impossible to know that behavior is aligned, because you won't be able to completely verify. Drift is bad enough in vibe coding. Runaway AI will drift massively and there's no way of knowing where it will end up.
All that's to say, verification should be the focal point of AI Engineering for the foreseeable future and maybe forever. That is: how do you capture what you want to do, refine that into specifics, and then follow up with automated tests, assertions, evals, and customer feedback to progressively harden your software?
The verification problem is acute now because of how cheap software generation is. Because of the superlinear nature of software verification complexity, companies that push hard on the barrier and successfully shift it right will have a built-in moat versus those who fail to put in the verification work.
Detailed blog post and interactive model incoming.
Links:
[1] https://t.co/aZXDirKO6y
[2] https://t.co/Dpz7ME2N0U
[3] https://t.co/EILf0s7OuJ
[4] https://t.co/uDivaeJAAa
[5] https://t.co/ddNheu1LcK
[6] https://t.co/GvkfTsBmdm
[7] https://t.co/TVte0CCu8s
[8] https://t.co/seq3X9JiM6