How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.
Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.
Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.
Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.
Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.
Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.
The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.
Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.
What we call talent is often just the combination of:
A deep need to win and high agency
The ability to learn fast from mistakes
A beginner’s mind that never disappears
The common thread: an unusually high rate of learning.
A good general rule is when your instinct says something is dumb, and that something was planned for years by people with way more information than you, the dumb one might be you.
Welcome to the greatest city on Earth.
Here are some tips and tricks for how to get around and make the most of your time here.
Learn more at https://t.co/mki8gOyIED.
The companies letting their teams burn tokens and experiment right now aren’t being reckless.
They’re building instincts their competitors won’t be able to buy later.
May your tokens be with you!
There’s a chasm between people who’ve seen AI demos and people who’ve felt what’s possible firsthand.
It’s the difference between reading about the internet in 1995 and getting your first dial-up connection.
You can’t unfeel that moment.
Liftoff.
The Artemis II mission launched from @NASAKennedy at 6:35pm ET (2235 UTC), propelling four astronauts on a journey around the Moon.
Artemis II will pave the way for future Moon landings, as well as the next giant leap — astronauts on Mars.
25 years ago at Eden Gardens Rahul and I shared a partnership that will forever remain special. In a moment when the game looked beyond us we chose belief, patience and resilience. That stand was not just about runs but was about trust, teamwork and fighting for every session. Grateful to have shared that journey with Rahul and to be part of a Test that reminded us all that in cricket comebacks are always possible👍 @BCCI #PowerOfPartnership #Believe #Resilience
🗞️ @sdxcentral shares AI inferencing trends in 2026 featuring @IshitV and the AWS vision to build Amazon Bedrock as the world's biggest inference engine. Read: https://t.co/8U4VY2xhv4
@collinsadam Hi Adam - Can I please get the hi-res version of this photo for a fan print? I appreciate it if you can DM me, happy to also purchase it if you have it somewhere. Thank you!