Someone in the replies put the next question best, can we also measure the corresponding productivity? Routing to cheaper models and caching is the obvious first move, but without visibility into what each call actually did and how it changed the way people work, you can't see whether the output held up. You end up paying less without knowing any more.
How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.
Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.
Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.
Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.
Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.
Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.
The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.
Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.
The token-maxxing era coming to an end with everyone realising they need to spend better. But spending better needs visibility, and a $4k total in Claude Code after 3 days is just a bill arriving late.
Most of what looks like heavy use is bloat and unnecessary context, retries, wrong model for the task. You can't cut any of that if the only thing you can see is the total.
Our Anthropic bill is about to jump from $400K → $1.4M/yr.
Not because usage exploded, but because we're about to cross 150 seats.
Past 150 seats you're forced into Enterprise tier. Seats stop including any usage, every token bills at standard API rates. At our current run rate that's 3.5x overnight.
Unfiltered thoughts on AI spend:
1. We should spend tokens to grow as aggressively as possible. But most people (me included) aren't conscious of what they're spending.
2. Visibility comes first. People see their personal number and they're shocked. I accidentally spent $4,000 in 3 days in Claude Code.
3. For engineering the spend is clearly worth it. Pay for the best model, it saves more than it costs.
4. For a lot of other roles it's questionable. Apps nobody uses, skills someone already built. No ROI.
5. Spend limits are coming. We already require approval for more tokens on our support team.
The era of token-maxxing is coming to an end.
Companies are trying to control AI spend the same way they controlled cloud spend. It doesn't work.
Cloud bills showed you what you paid for: EC2, API Gateway, S3. AI bills just show tokens and a total. No breakdown of what the spend did.
https://t.co/l4NusrZx3Q via @WSJ
"It's addictive." Satya on token maxxing, and he's right. You love it until the novelty wears off. The catch: the invoice tells you what you spent and nothing about what it was worth.
https://t.co/IcYppz0emJ via @businessinsider
Strong thread, but what wins enterprise in the long run isn't cheaper tokens, it's knowing which tokens were worth buying. Cut the price and enterprises just run more experiments they still can't grade.
The AI Business model trap: LLMs want cash flow to fund the race to AGI or the next model. Enter free consumer AI - they are losing a lot of money on the breadth of models to serve consumers for free! They are caught in the post training data trap, free consumer usage feeds post training needs, it can't be right to stop serving customers for free?
But they need money for the compute:
The monetization challenge is being pointed to Enterprises.
Phase 1 - seemed easy, value capture in coding, the most bottom up motion in enterprise - with low customization per customer. Developers continue to train coding, tasks and eventually will train flawless skills.
Phase 2 is where the challenge lies, showing true enterprise value. The promise of efficiency, accuracy, elimination of resources - that requires a different approach, build depth with harnesses, context, memory, solving for edge cases with deterministic guardrails! Build skill libraries - enter FDEs. Yes,FDEs will train the enterprise Waymos of the world.
The risk - high token pricing for enterprises while consumers for free! Yes for consumer distribution businesses (aka Google, Meta, Apple, etc) it makes sense to hold on the distribution with free AI.
If you want to win enterprise, you should be forward pricing tokens. The cheaper the tokens for enterprises it will allow for experimentation, workflow reimagination - instead CIOs are busy restricting AI use and working on making the use more efficient!
Paradox: They still haven't fully understood and embraced the value of AI in the enterprise.
If I were them:
1. Cut token pricing now, else send enterprises to secure opensource and end up with friction filled routing layers.
2. Show me how enterprises can use their context, training and data as their competitive advantage.
3. Build tools for rapid edge case learning and reducing false positives.
@HarryStebbings@sama@DarioAmodei@demishassabis
The question isn't whether AI spend went up. It's whether AI spend created proportionally more value.
Most companies can answer the first question. Very few can answer the second.
The framing here is exactly backwards. This is the strongest AI bull signal anyone has published this year.
Uber deployed Claude Code to 5,000 engineers in December. By March, 84% were classified as agentic coding users. By April, 95% used AI tools monthly. 70% of all committed code came from AI systems. The tool worked so well that four months of usage consumed the entire annual budget.
Read that again. The "crisis" is that engineers loved the tool so much they used it 3x more than finance predicted. Uber's finance team built their models around fixed seats and low-frequency calls. What they got was 5,000 engineers running parallel agent workflows eight hours a day. The budget model broke because the adoption model worked.
Run the math on the alternative. A senior engineer at Uber costs $350-400K fully loaded. 5,000 of them run about $1.75 billion in annual compensation. AI tools producing 70% of their code output for somewhere between $60-100M a year is a 15-20x return on the AI spend. The "blown budget" is a rounding error on the engineering payroll it's augmenting.
Microsoft's move is even more straightforward. They invested $13 billion in OpenAI. They own GitHub Copilot. The Experiences & Devices division canceling Claude Code licenses by June 30, the last day of Microsoft's fiscal year, and migrating to Copilot CLI is a vendor consolidation play dressed up as cost management. Claude models still run inside Copilot. The interface changed. The capability didn't.
The companies that set up internal leaderboards ranking teams by AI usage, that coined "tokenmaxx" as a strategy, that rewarded maximum consumption, and then panicked when the bill arrived aren't experiencing an AI cost crisis. They're experiencing a forecasting crisis. The CFO built the budget for a chatbot. The engineers got an agent.
Goldman Sachs projects token consumption will grow 24x by 2030. The companies scrambling to cap budgets today are going to look like the enterprises that limited employee internet access in 2003 because bandwidth was expensive.
We went from "use as many tokens as possible" to "use as few as possible" in one quarter, but neither one is a strategy.
Q3 winners won't be whoever spent less, they'll be whoever can prove what the spend did.