CEO & Co-Founder @ π¨π»βπ» Building an AI sales agents + other projects π° bootstrapped, profitable ππ¨ππ¬π§ Ex-Clari, Ex-Salesforce
How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.
Better Defaults (not Usage Caps) β Engineers can choose any model they want, but defaults matter. Weβre experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.
Better Routing β In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.
Better Caching β Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so weβre reusing a warm cache wherever possible. For example, our cache hit rate went from 5% β 60% in LibreChat once properly implemented.
Keep Context Lean β Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.
Better Visibility β Our engineers can use as many tokens as they want, from whatever model they want, but weβve made usage visible β and the more you spend on AI, the more impact we expect.
The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.
Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.
@illyism Wouldnβt be too bad to be honest. Reminds me of people wearing fancy fashion like uncomfortable shoes or pants just to βlook goodβ rather than feel good. I much rather feel comfortable.
I think the pity and mockery stems from so many Europeans (usually institutions/"experts") dismissing AC's utility. The vast majority of the debate is not over the mechanics and logistics of deployment, but whether widespread AC is in principle desirable in the first place. I think it garners so much attention from Americans (and Europeans such as myself) because it's a vivid distillation of the self-destructive "degrowth" hysteria that is so prevalent and harmful in many European policy matters. It is the same impulse that led to Germany decommissioning its nuclear capacity, and it is a substantial contributor to the broader economic challenges of the continent.
@levelsio Wow people in Mexico and Saudi Arabia are happier than in Switzerland? Surprising.
Thanks for sharing although being in Switzerland with roots in the Balkan, I think thereβs no way I personally would be happier or similarly happy in Bosnia.
Big honor joining @alextheuma's Shift AI pod. I shared my screen and spilled the beans on how we went AI-first, stayed 100% bootstrapped and serve 2000+ users globally as a team of 2.
Pod: https://t.co/0yLisPba4U
Youtube: https://t.co/P1o9KzQrG3
Thanks for having me, Alex.
The users who complain about the flaws in your product may seem annoying, but they are on the whole probably your most valuable users. They complain because they care, and I doubt a startup could ever get really big without users who care a lot about the product.
@denisyurchak Agree. My Midea is doing a great job vs nothing but doesnβt beat a system thatβs fully built in. With good window insulation kits itβs pretty ok, able 5-8 C colder than outside.