Semir Jahic @realSemir - Twitter Profile

How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching. Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work. Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task. Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented. Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted. Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect. The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable. Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.

brian_armstrong's tweet photo. How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.

Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.

Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.

Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.

Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.

Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.

The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.

Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.

414

5K

626

6K

3M

Who to follow

Howard Watt

@VanWatt

#VentureCapital Lawyer 🤓 @fladgatelaw Occasional Angel Investor 📉📈 I lift, bro 💪🏼🏋🏻‍♂️ Views my own 🤫

software developer, open source advocate, cycler, husband, father of 2. #enjoylife

Semir Jahic @realSemir

about 11 hours ago

@illyism Wouldn’t be too bad to be honest. Reminds me of people wearing fancy fashion like uncomfortable shoes or pants just to “look good” rather than feel good. I much rather feel comfortable.

0

174

Semir Jahic @realSemir

2 days ago

@a16z This is the way!

0

45

realSemir retweeted

a16z @a16z

2 days ago

Solopreneurs are making it big Charts of the Week: https://t.co/qmjdkZzisp

84

1K

168

678

153K

Semir Jahic @realSemir

2 days ago

@patrickc @TweetsOfSumit @Nicolas_Colin Second that!

0

42

realSemir retweeted

Patrick Collison

@patrickc

2 days ago

I think the pity and mockery stems from so many Europeans (usually institutions/"experts") dismissing AC's utility. The vast majority of the debate is not over the mechanics and logistics of deployment, but whether widespread AC is in principle desirable in the first place. I think it garners so much attention from Americans (and Europeans such as myself) because it's a vivid distillation of the self-destructive "degrowth" hysteria that is so prevalent and harmful in many European policy matters. It is the same impulse that led to Germany decommissioning its nuclear capacity, and it is a substantial contributor to the broader economic challenges of the continent.

73

3K

219

252

166K

Semir Jahic @realSemir

2 days ago

@levelsio Wow people in Mexico and Saudi Arabia are happier than in Switzerland? Surprising. Thanks for sharing although being in Switzerland with roots in the Balkan, I think there’s no way I personally would be happier or similarly happy in Bosnia.

1

0

163

Semir Jahic @realSemir

2 days ago

Brilliant.

Charles Curran

@charliebcurran

3 days ago

You're hiding an air conditioner under your floor, aren't you?

355

38K

3K

6K

3M

0

54

Semir Jahic @realSemir

3 days ago

Based on this, I'm a conservative but not by choice. I'd love to be a nationalist during this heatwave.

@levelsio

3 days ago

49

6K

461

598

263K

0

18

Semir Jahic @realSemir

3 days ago

🤦‍♂️

Joe Hill

@jo3hill

4 days ago

In Britain today politicians would rather ration work than legalise air conditioning.

22

2K

116

32

48K

0

52

Semir Jahic @realSemir

3 days ago

Big honor joining @alextheuma's Shift AI pod. I shared my screen and spilled the beans on how we went AI-first, stayed 100% bootstrapped and serve 2000+ users globally as a team of 2. Pod: https://t.co/0yLisPba4U Youtube: https://t.co/P1o9KzQrG3 Thanks for having me, Alex.

0

8

Semir Jahic @realSemir

3 days ago

@itsolelehmann Always better together. Stay cool!

0

67

Semir Jahic @realSemir

3 days ago

@forgebitz Very “cool”

0

20

Semir Jahic @realSemir

3 days ago

So true!

Paul Graham

@paulg

4 days ago

The users who complain about the flaws in your product may seem annoying, but they are on the whole probably your most valuable users. They complain because they care, and I doubt a startup could ever get really big without users who care a lot about the product.

462

9K

1K

997

860K

0

8

Semir Jahic @realSemir

3 days ago

@Noahpinion 🤦‍♂️ it’s Kafkaesque

0

20

Semir Jahic @realSemir

4 days ago

@denisyurchak Agree. My Midea is doing a great job vs nothing but doesn’t beat a system that’s fully built in. With good window insulation kits it’s pretty ok, able 5-8 C colder than outside.

0

1

0

536

Semir Jahic @realSemir

4 days ago

@DavidOndrej1 Never felt so rich.

0

1

0

323

Semir Jahic

@realSemir

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users