๐จ New research alert! For the past few months, I've been a part-time visiting economics researcher at OpenAI. Excited to share the first public piece of work to come out of this, which uses data from Codex to document the ongoing and rapid shift to agentic AI.
Details below ๐
agree, high AI spend inside an organization is (very) often a symptom of poor allocation rather than high utilization.
hence how some companies see drastically different outcomes for equal or more token spend.
couldnโt have put it better than @rahulgs and well worth a quick read
it is simultaneously possible to spend a lot on AI and still underuse it, esp in larger orgs
we're seeing this with meta, uber, and many other orgs instituting budgets
some factors are at play:
1. cost of the frontier comes at an enormous premium: fable -> glm 5.2 is a 10x dropoff in cost
2. tragedy of the commons, in large orgs, much safer to always default to larger model at a higher reasoning effort. ends up in a situation where most features/people are on too high of a setting, resulting in 2-3x more spend than needed
3. very easy for runaway automations, openclaw bros, subagent accidents, to create a lot of spend quickly
results in a very skewed distrubtion of usage with a small number of people/features with high usage
to counteract these issues, and avoid internal budgets (for now)
1. we changed defaults across the company to lower reasoning levels, across surfaces
2. thinking about the p50, p75, p95 session. cost to PR/cost for support ticket/cost for session, and actively compressing model tiers (gpt 5.1->5.4-mini) over time
3. banning automations from using frontier models, and high reasoning efforts, and using flex api tiers (adds up to 75%+ savings)
tldr before you institute budgets, try these first
more in the blog:
https://t.co/L5HnjstvI8
iโve tried to consistently use at least two or three different model harnesses every day.
iโm quick to pick favorites, but itโs also quite interesting to see the variation across different models and harnesses for different tasks.
@brendan_salgado@Polymarket scarce, hard to debase, outside government control, instantly transferable, and programmable.
everything the dollar and gold alone canโt be.
@Willob reading this after spending my night having chat generate ocean robots for my copycat landing page... bummer I have to actually do a little work
this is what the future of engineering looks like.
if youโre resourced to do this, i think you absolutely should be. if you donโt have the resources to operate this way yet, figuring out how to get them should be a p1 priority. you are now competing against teams with massively leveraged engineering output.
we are no longer just building software. we are building software that autonomously builds more software.