How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.
Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.
Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.
Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.
Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.
Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.
The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.
Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.
it is fun building here. we have the choice to route to frontier models for complex tasks and also have access to powerful ways of routing
for example, in one of our internal tools Mux we can choose to plan with opus 4.8 and route the implementation by spawning sessions on different git worktrees using composer 2.5 @cursor_ai CLI
hmm.. maybe try setting your context better
"literally ignores half the details" either means your task is too big or your LLM is clueless in your repo, which means you need to work a bit before the LLM knows what to do
for example :
- keep all signals in @linear (bugs, tasks to do, docs, projects)
- have a simple agents.md that points to different context in your repo
- let your LLM search your PR history to see the foundation (If you have more than one repo, sourcegraph is good in my experience)
you helped me a lot with LeetCode in the past so i’m glad to return the favor :)
the loop stuff will only work after manually prompting works really well with your repo and setup
yes claude dropped tag, but we had this months ago @coinbase - shout out @alrodi & @kylecesmat's team & more😉
My team at Advisor now has our own branded Slack agent. Full context on all docs, tickets, codebases, and observability and triage ability.
Our new employee.
@mitsuhiko also curious - my experience with loop engineering was not successful except for watching CI or monitoring for other mundane tasks. Have not found a way to use loops to produce good code in a project.
Big week for Coinbase! Like I said on stage, thanks to all the Coinbase employees (and their thousands of AI agents) whose hard work and dedication made all of our announcements possible.
The everything exchange now includes pre-IPO perps, stock options, and tokenized stocks soon. We also redesigned Coinbase Advanced, and started to combine our global liquidity (between US and international users, and also Coinbase and Deribit users)
@CoinbaseDev is bringing the benefits of stablecoin payments to businesses everywhere, with fully custodial accounts using our compliance stack, and launched an awesome new dashboard for all dev tools.
For @base, we announced private transactions and Base App on web.
And finally, Coinbase is also becoming the financial account for AI - give your agent a wallet, get AI-powered financial advice, and connect your Coinbase account to your favorite LLM.