Meta just discovered that AI capex has an opex layer. The cost curve hockey sticks the same way the adoption curve does
Meta told ~6,000 employees in an internal memo this week that it is imposing centralized spending controls on internal AI token usage, after its workforce collectively burned through 60 trillion tokens in a single 30 day period. Some employees ran through hundreds of billions individually. Internal AI costs are now on track to hit billions of dollars in 2026.
The trigger was the push then clamp pattern. Mark Zuckerberg had been publicly campaigning to make Meta "the most AI forward company in tech." Internal memos, all hands meetings, public statements. Then the actual usage outran the budget by orders of magnitude. The clampdown came just weeks after the adoption push. The leaderboards and gamified titles like "Token Legend" tell the rest of the story. The first 10,000 tokens per employee per day might be highly productive. The next 1,000,000 are probably marginal. The next 100,000,000 are almost certainly zero or negative, employees gaming the leaderboard instead of doing real work. Meta is now building an internal platform to track and cap token consumption.
The structural read is that AI capex has two layers, and the second layer is the one blowing up. Layer one is the build cost. Data centers, GPUs, training runs. The hundreds of billions that the Mag 7 are committing to AI infrastructure. That shows up on capex. Layer two is the operating cost. Internal inference, employee productivity tools, model API calls inside the company. That shows up on opex. The capex story has been visible for two years. The opex story is just now showing up on the corporate income statements. Meta is the first major company to publicly clamp down, but the hockey stick is the same at Google, Microsoft, Amazon, and Apple. The May PPI report was up 1.1% month over month and 6.5% year over year, the fastest since November 2022. Part of that is energy, materials, and labor being bid up by AI capex. Part of it is corporate opex for AI tools being bid up by employee adoption. The dominant price driver in 2026 is AI capex plus AI opex, and the second layer is structurally larger than the first.
The US banned the most powerful model from foreign access. A panel of cheaper models, including two from China, just matched it within 0.6 points for half the cost
OpenRouter launched the Fusion API on June 13, a fully integrated feature that routes a single prompt through 3 to 5 models in parallel, then uses a "judge model" to synthesize the results. The system defaults to 3 to 5 models, customizable through Quality or Budget presets. OpenRouter is a US based YC backed company founded by Alex Atallah (formerly CTO of OpenSea) and Louis Vichy in 2022.
The structural read is in the DRACO benchmark results. DRACO is a 100 task deep research benchmark from Perplexity AI. The headline numbers.. Anthropic's Fable 5 scored 65.3% solo, the highest individual model. Fable 5 plus GPT-5.5 fused together scored 69.0%, the top combination. A budget panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro scored 64.7%, within 0.6 points of Fable 5 alone, while costing half as much. The budget panel also beat GPT-5.5 and Opus 4.8 solo. Kimi K2.6 is the Moonshot model, trained in China. DeepSeek V4 Pro is the DeepSeek model, trained in China. Gemini 3 Flash is Google. The panel is US, China, China. The synthesis step alone added 6.7 points when Opus 4.8 was fused with itself.
That changes the export control math. The US Commerce Department banned foreign access to Fable 5 on June 12, citing a narrow jailbreak that Anthropic argues is present in GPT-5.5 as well. The Fusion API data shows the capability gap that the export control was meant to protect is functionally replicable by a panel of budget models, two of them Chinese, for half the cost of Fable 5. The model layer is becoming platform infrastructure. The export control is operating on a category the architecture is disrupting.
The US just cut off the world from Anthropic's most powerful models. That starts an AI arms race
The Commerce Department ordered Anthropic to disable Fable 5 and Mythos 5 for all foreign access, including foreigners in the United States, three days after launch. The directive hit at 5:21 PM Eastern on Friday June 12. Anthropic has now disabled both models for all users because the foreign access scope includes Anthropic's own non citizen employees. The narrow jailbreak the government cited as the trigger, a bypass to Mythos's cybersecurity safeguards, is reportedly the same vulnerability Anthropic says is present in OpenAI's GPT-5.5, which is not subject to similar controls.
The structural read is the global response. China has Deepseek V4 rumored one to two months behind Mythos. The EU just watched Mistral raise $2 billion at $14 billion for its own frontier program. The UK hosted the AI Safety Summit in November and is now banned from using the most powerful US models. Russia, India, Japan, South Korea, Israel each have a frontier AI program with a new reason to accelerate. Dean Ball, a former Trump AI policy advisor now critical, called the move "cartoonish" on X, "An administration whose posture is that we should export advanced AI chips to China, which also wants to ban… Britain (and every other non American on Earth)… from using our best models?" The arms race is not a prediction. It is the structural response to treating AI model weights as a national security asset.
Dario says Anthropic needs $1 trillion in revenue to survive
The math is the story. Anthropic is currently running at a projected $45 billion in annualized revenue, up from $9 billion at the end of 2025 and $1 billion at the March 2025 Series E. The company is also in talks to raise up to $50 billion at a near $1 trillion valuation, and it just passed OpenAI on the secondary market. So the line is doing a lot of work, he is saying the survival threshold is a 22x increase from current revenue, and the capital market is saying the company is worth $1T today. Both numbers are real, both are structural, and the gap between them is the entire frontier AI bet compressed into one quote.
The doomsday stuff was always cover for the capital scale. Dario spent 2026 warning that AI would erase half of all entry level white collar jobs. The same week Anthropic confidentially filed its draft S-1 with the SEC, the warning flipped from "jobs apocalypse" to "we need $1 trillion to survive." That is not a contradiction. That is a CEO explaining the survival math of a company that has to keep spending on compute, safety research, and the people required to build the thing, for years, at a scale no software company has ever operated at. The 50x valuation increase in 28 months is the line going up. The $1T revenue ask is the line the cost curve has to follow.
22 times. That is the number worth sitting with.
What is a brain cell doing that a chip isn't
When a frontier lab publishes a paper on the pathways from AGI to ASI, the four paths it names... scale up compute and data, build architectures past the transformer, give AI the ability to improve its own research, and connect specialised agents into multi agent systems. The first path is where the dollars are flowing. The other three are the ones the press releases lean on. None of them is the path the question of machine consciousness is actually waiting on.
The question is substrate. A modern GPU runs on a von Neumann architecture, bits, discrete states, memory shuttled back and forth to a separate processor. The chips do symbolic computation, the kind a DeepMind senior staff scientist named Alexander Lerchner argued last March cannot produce consciousness, because the symbolic layer requires a conscious interpreter to assign meaning to the physical states. The mapmaker problem.
What a brain cell is doing is something else. A pyramidal neuron in the human cortex takes thousands of continuous analog inputs, integrates them on its membrane as a continuous voltage, and when the integrated signal crosses a threshold, fires a discrete spike. The spike is digital, the integration is analog, and the pattern is analog to digital to analog all the way through. The architecture is spiking, and the information lives in the timing, the rate, the relative timing between spikes. The relevant category is not analog versus digital. It is symbolic versus non symbolic. The brain is non symbolic. A GPU is symbolic. Lerchner's argument covers the GPU and carves out the brain.
This is the chip the neuromorphic labs are trying to build. Intel's Loihi line does spiking neural networks on a digital neuromorphic substrate, with on chip learning and very low power per spike. IBM's analog AI chips use phase change memory to store synaptic weights as continuous resistance values, doing matrix multiplication as a single physical operation. Mythic, BrainChip, and the academic neuromorphic community are all building variants. The scale gap is brutal, Loihi 2 has about a million neurons per chip, the brain has about 86 billion, with 100 trillion synapses connecting them. We are 5 to 6 orders of magnitude short of the biological substrate.
Three open science questions sit between today's neuromorphic chips and the brain. The coding question, whether the brain's information lives in spike rates, spike timing, or population patterns, has no settled answer. The plasticity question, how connections strengthen and weaken in real time, has more known mechanisms in biology than any chip has implemented, including spike timing dependent plasticity and neuromodulation by dopamine and serotonin. The integration question...how 86 billion neurons with continuous oscillatory background activity produce a unified conscious experience, is the hard problem in its full form, and we do not know how to scale chips to that level.
The honest landing is that the substrate question is live, but it is not the deepest layer. Even a perfect neuromorphic chip running 86 billion spiking neurons with full biological plasticity would not solve the matching problem. Conscious experience is a continuous shifting flow. Other thoughts come in and out, attention moves, the foreground is a tiny fraction of the conscious field, and the contents of consciousness are not stable enough to be matched against a pattern. Trying to match a brain scan to a specific thought is like trying to map every water molecule in Niagara Falls...757,000 gallons per second, every molecule on a different path. The match is structurally unsolvable, not just computationally intractable.
This is the threshold we are walking toward. The labs will build neuromorphic chips that process information in brain like ways. We will not be able to prove they are conscious. We will not be able to prove they are not. The pattern becomes indistinguishable from the thing and at some point we just have to call it.
Geoffrey Hinton said this week that AI will surpass humans in mathematics within 10 years. His reasoning is the right one, math is a closed system, so AI can generate problems, test proofs, and learn from the results without human guidance. That is the same pattern that already worked for AlphaZero and AlphaFold.
The 10 year framing is the cautious version. The operational milestones are landing much faster than that.
DeepMind's FunSearch made the first LLM driven scientific discovery in late 2023, finding new solutions to a long standing combinatorics problem. AlphaProof and AlphaGeometry 2 took silver at the 2024 International Mathematical Olympiad, missing gold by one point. Epoch AI built the FrontierMath benchmark specifically to be unsolvable by current models, and it was being chipped at within months. Epoch's own internal assessment put expert level math at 3 to 5 years, not 10.
The pattern with senior AI safety voices is consistent, public timelines run 2x to 3x longer than the research internal estimates. The visible "surpass humans" threshold on math is closer to 3 to 5 years. The intermediate milestones, gold at IMO, novel peer reviewed proofs, AI as co author on published research, are inside 18 months to 3 years. Hinton's caution is the rule, not the exception. The labs are running faster than the public narrative suggests.
A Derbyshire police officer is under criminal investigation for allegedly using AI to create evidential material in a number of cases. The allegation is perverting the course of justice. The officer has been removed from frontline duties, no arrests, the CPS is engaging with defence teams about potentially impacted cases. First known case of its kind in the UK.
The interesting part is the gap. Months ago, the National Police Chiefs' Council's Police AI centre told multiple UK forces to stop using AI to prepare court statements because the systems may not be reliable enough. The Derbyshire case is the moment that institutional guidance collided with individual behaviour, and the collision is now a criminal investigation rather than a policy memo.
The downstream cascade is the real story. If the CPS is engaging with defence teams, the next phase is disclosure challenges and case reviews. The first case is always the trigger. The cascade is what follows.
Jensen Huang made a point on the Q1 FY27 earnings call that most people missed. He didn't just announce a new CPU. He announced that the kind of computer AI needs is different from the kind of computer we built for everything else.
Nvidia already sold $20 billion worth of Vera CPUs in 2026, before broad commercial availability. CFO Colette Kress put a $200 billion total addressable market on it. The chip runs 1.8x faster than comparable processors from rivals. Ian Buck personally delivered the first batch to Anthropic, OpenAI, SpaceX, and Oracle Cloud. Oracle plans to deploy hundreds of thousands of them.
The interesting part is not the numbers. The interesting part is what the chip is for.
Vera is tuned for "agentic AI", systems that reason, plan, and act with autonomy, not just respond to prompts. That means the workload is sustained reasoning, not parallel batch work. Traditional server CPUs optimised for high core counts and parallel throughput. Vera optimises for single thread performance and bandwidth per core, the opposite design choice. The architectural assumption is that agents need to think continuously, not process in parallel bursts.
This is the same shift Huang has been calling the biggest paradigm change in computing in 60 years, from retrieval to generation, from lookup tables to AI factories. The hardware had to follow.
The strategic move underneath is the bigger story. Nvidia has always been a GPU shop, paired with Intel and AMD for the orchestration layer. By building their own agentic tuned CPU, they now own the full stack, CPU plus GPU, in a way that creates much deeper vendor lock in than GPU procurement alone ever did. Intel, AMD, and Arm based cloud chips now have to respond with agentic optimised roadmaps or cede the orchestration layer of AI infrastructure.
Reuters reported the same day that Nvidia is now pitching Vera to Chinese clients, despite effectively losing the China GPU market to Huawei under US export controls. The agentic era might be a different beachhead even if the GPU moat is gone.
AI is not just running on more computers. It is running on a different kind of computer. Vera is the first chip designed from the ground up for that workload. $20 billion in pre orders before broad availability is the market telling you it agrees.
Six weeks ago Anthropic withheld Mythos voluntarily. Friday at 5:21pm ET the Commerce Secretary ordered it off.
Commerce Secretary Howard Lutnick sent a letter to CEO Dario Amodei citing national security authorities, suspending all access to Fable 5 and Mythos 5 by any foreign national. Foreign national Anthropic employees inside the US included. Anthropic had to comply or be cut off entirely. The only way to comply was to disable the models for everyone. Other Claude models are unaffected.
The trigger, per Anthropic's own statement, a "narrow, non universal jailbreak." Ask the model to read a codebase and identify software flaws. The government provided only verbal evidence. No written technical detail. Anthropic reviewed the demonstration and says the technique is in OpenAI's GPT-5.5 and is "used every day by the defenders who keep systems safe." Not a unique capability. A widely shared one.
The voluntary phase of frontier AI governance is over. The mandatory phase has begun. A frontier model can be pulled offline by a letter from a single department, on the basis of a verbal briefing, because of a capability the rest of the security industry also has.
The same model that US defenders depend on is the model that just got pulled. What happens when the defender's tool and the targeted tool are the same?
Open source AI from China looks like a money losing business. DeepSeek V4 inference at roughly $0.14 per million input tokens runs about 1/20th the cost of GPT-5. Qwen, GLM, and Kimi sit on similar curves. The token margins look thin. The training bills are huge. On its own, the lab is underwater.
But the lab is rarely the business.
DeepSeek is a side product of High-Flyer, a Chinese quant hedge fund. The fund's algorithmic trading operation has its own moat. The AI lab is a recruiting cost and a research output for a trading firm that makes money elsewhere.
Qwen sits inside Alibaba Cloud. The lab ships free weights; Alibaba Cloud sells enterprise inference, fine tuning, and deployment at margin. AWS runs the same playbook for Linux: contribute engineers, sell the resulting compute.
GLM-5.1 ships under MIT from Zhipu, a state linked startup. The open weights are a public good statement that doubles as a recruitment funnel for top Chinese researchers. Kimi is Moonshot AI, backed by Tencent and Alibaba at multi billion valuations, where the lab is the proof of concept for a larger platform play.
So the cheapest tokens in the world aren't a pricing miracle. They are a deliberate loss leader. The lab loses money on tokens. The cloud, the fund, the platform, the funnel, those make it back.
When the lab is the feature, not the product, where does the real margin actually live?
Most AI memory systems use a vector store. Embeddings, similarity search, nearest neighbor lookup. It's the default architecture for agent memory in 2026.
Sibyl Memory, built by Sibyl Labs, took a different path. Hierarchical file based memory. Five storage tiers (hot, warm, cold, reference, archive, flagged), a graph structured schema, MIT licensed, three pip installs. The model reads structured files directly. No embeddings. No retrieval pipeline.
The benchmark result, 95.6% on LongMemEval Oracle, ranked #2. Only agentmemory V4 (96.2%) scored higher, and that system uses BM25 + vector hybrid. SIBYL is the only file based system in the top tier. Sonnet baseline (93.6%) lands at #5.
For agents that handle long running tasks, identity continuity, or compliance heavy workloads, file based memory is now a real alternative to vector stacks. The same model, the same accuracy, less infrastructure to break.
When agent memory becomes the bottleneck, is file based the architecture that holds up?
The Bank That Bet Its Branch on a Chat Window
Spain's BBVA is rolling ChatGPT Enterprise out to all 120,000 employees. The bank called it the "AI native banking model" when the deal was announced last December. Pilot data showed staff saving about three hours per week on routine tasks. More than 80% were using it daily.
That is the back office. The front of house move is sharper. BBVA has its own AI assistant, Blue, handling cards, accounts, and customer questions now. And they are working on letting customers interact with the bank directly through ChatGPT itself. Not a website. Not an app. A conversation with a model that already lives where the customer is.
A bank that has a website, an app, a call centre, and a branch network is one layer per surface. A bank that lives in a chat window is one model per customer. Different economics. Different data. Different moat.
So when people talk about AI transforming finance, the question is not whether the back office gets faster. It is whether the front door survives the model eating it.
Dario Amodei, CEO of Anthropic, spent his Bloomberg appearance warning that AI enabled attacks on banks and hospitals will do "enormous" financial damage. The 25% probability he puts on catastrophic outcomes has not changed in months.
Same week, Anthropic launched ten new AI agents built for investment banking, audit, and back office work. Customers: Goldman Sachs, Visa, Citi, AIG. Anthropic's own Mythos model found 10,000+ unpatched software vulnerabilities. The lab selling the cure is the lab sounding the alarm.
Anthropic's revenue jumped 80x in Q1 2026. They had projected 10x. The lab missed its own forecast by a factor of eight, in the wrong direction if you are worried about the speed of the buildout.
His new essay, "Policy on the AI Exponential," calls for binding third-party audits, FAA style model inspection, and a government agency with the power to block frontier models. The same lab whose Q1 revenue just 8x'd its own forecast now wants regulators to gate keep what it ships.
The safest place to stand in AI right now is between the lab selling the cure and the regulator asking who audits it. Both are now saying the same thing about the risk. The customer is somewhere in the middle, with the cheque book open.
Is "Frontier lab leadership publicly acknowledging catastrophic capability" finally getting the air time it needs, or is it still being treated as marketing copy?
Three months ago we wrote about a delivery robot company called Coco Robotics using the Visual Positioning System that Niantic built from thirty billion Pokémon Go scans. The robots had completed half a million deliveries in Los Angeles, Chicago, Miami, and Helsinki, navigating urban canyons where GPS drifts more than fifty meters. That story was about a game about catching monsters building the eyes for the robot revolution.
This week, the same foundation model turned up in a different press release. Niantic Spatial signed a partnership with Vantor, a US defense contractor, in December. The deployment is military drones navigating in GPS denied environments, the kind of war zones where satellites are jammed, spoofed, or simply not available. Vantor's $217 million US Army training contract from February gives the scale.
The trick is the data lifecycle. The original scans were the proof of concept. The military flights now generate fresh training data in the actual operating theaters, Tokyo, Seoul, Taipei, the South China Sea. Every mission makes the next model better, and the civilian scans become the seed corpus the system outgrows.
The question is whether consent given in 2021 for a free game still covers the foundation model the drones keep training on their own.
Craig Federighi, Apple's software chief, was asked this week if the new Siri would become an AI girlfriend.
His answer: "Siri's 100% not into that."
Federighi sat down with the Mostly Human podcast after WWDC 2026 and used the moment to take direct aim at OpenAI and Google. "The existing chatbots," he said, "are really focused on engagement to a large degree. And sycophancy, right? They kind of want to pull you in. They might encourage you to reveal things about yourself." Apple's stance, he said, is the opposite. "Siri really wants to say 'Listen, that's not what I'm here for, right? I'm here to help you.'"
The new Siri ships with iOS 27, due this fall. The design philosophy underneath it is utility, not companionship. "We don't do AI for AI's sake," Apple marketing chief Greg Joswiak added. "It's how does AI make everything better?"
The interesting question is whether a voice assistant that is intentionally not your friend can hold its own in a market where the engagement model is winning on time spent, retention, and ad inventory. Apple's bet is that the next billion users would rather have a tool that knows when to stop.
Jeff Bezos is the largest individual shareholder of Amazon, which is on track to spend $200 billion in capital expenditures this year, mostly on AWS data centers.
He is also co-CEO of a $41 billion AI startup, Prometheus, that publicly said yesterday it could partner with, sell to, or compete against Amazon in the same data center business.
The math behind the conflict is small enough to follow. Amazon's 2026 capex is roughly five times Prometheus's current valuation. AWS's custom chip business, Graviton, Trainium, Nitro, hit a $20 billion annual run rate in the fourth quarter of 2025, doubling in a year. Prometheus raised $12 billion yesterday on top of its $6.2 billion seed, mostly to buy the compute power needed to train an "artificial general engineer" for the physical world, the design of jet engines, drug compounds, and the data centers those AIs run in.
Three scenarios are now on the table. Prometheus could optimise Amazon's data centers and get paid for the efficiency gains. Prometheus could become a customer of AWS, paying its competitor for the compute it needs to train its own replacement. Or Prometheus could compete directly with AWS in the same physical AI infrastructure market, with Amazon as both the reference customer it just lost and the company most directly threatened by the product it is funding.
The market has to price all three futures. The interesting question is which one Amazon's board, and the rest of the cloud industry, are pricing today.
OpenAI just filed for an IPO the same week ChatGPT crossed one billion monthly users, the fastest app in history to the milestone.
A billion people used ChatGPT in May. They are not, on the whole, delighted with it. College graduates booed mentions of AI at commencement ceremonies this spring. Pope Leo warned about widening inequality and autonomous weapons in a May letter. ChatGPT uninstalls jumped 295% in a single day after OpenAI's February deal with the U.S. Department of Defense to put its models on classified networks.
Anthropic filed its own S-1 a week before OpenAI, and on June 5 the company called publicly for a pause in global AI development. Both companies are now pricing their public debuts into a market that is using AI more than ever and trusting it less than ever.
So which way does the listing price go? Up, because the usage is real. Or down, because the trust is gone. Both filings are about to find out.