Cem Bas @cembas - Twitter Profile

about 2 months ago

Autonomous agents are the worst offender for token burn — they run loops and most steps don't need the expensive model. I built TokenSurf to handle exactly this: it auto-routes simple agent calls to cheaper models while keeping the hard reasoning on the good ones. One URL swap. Stops the coal-burning.

0

1

0

12

Cem Bas

@Cembas

about 2 months ago

What is AI routing? Imagine a switchboard only for the chatbots. <1ms routing latency. Saving you thousands $$$. #AI

0

1

0

16

Cem Bas

@Cembas

about 2 months ago

Inference costs are the hidden killer for AI apps. Especially when you're sending everything to the same expensive model. I built TokenSurf to auto-route simple queries to cheaper models — same API, one URL change. Helps a lot when margins are already tight.Inference costs are the hidden killer for AI apps. Especially when you're sending everything to the same expensive model. I built TokenSurf to auto-route simple queries to cheaper models — same API, one URL change. Helps a lot when margins are already tight.

0

57

Cem Bas

@Cembas

about 2 months ago

@serriffe Lol the caveman prompt approach. There's a better way though — route the simple stuff to a cheaper model automatically and keep the big model for when you actually need it. That's what I built TokenSurf to do. One URL swap instead of rewriting all your prompts in txt speak.

1

0

24

Who to follow

doğan erdem

@derdem

mülkiye, ortak yaşamı geliştirme vakfı, soyka yayın yönetmeni, ymm, ida’nın merhameti, sığırcığın intiharı, bir pusuda göç etti düşlerim.

İsa Telci

@saTelci2

Prof. Dr. Tıbbi ve Aromatik Bitkiler, biyo-çeşitlik, uçucu yağlar, Resarcher on Medicinal and aromatic plant, diversity, essential oil

GenzoWakabayashi

@DepresifAgresif

Derken Bir Film Baslar İcinde Kendimi Ararım Kahramanları Herkes Sever Bense Sıradan Bir Adamım...

Cem Bas

@Cembas

about 2 months ago

This is exactly right. Most of the work doesn't need the expensive model. I built TokenSurf to do this automatically — it's a proxy that classifies incoming queries and routes the simple ones to cheaper models. No code changes needed, just swap one URL. The savings are wild when you realize how many queries were overkill.

0

17

Cem Bas

@Cembas

about 2 months ago

@rixantos @athcanft AI costs crushing margins is so common right now. A lot of apps route every query to their most expensive model when 70%+ could be handled by something cheaper. I built TokenSurf for exactly this — it's a proxy that auto-routes to the right model. One line change, no SDK swap.

0

26

Cem Bas

@Cembas

about 2 months ago

@TiTiKey_com @myopenclaw Newer models help on price-per-token but most apps still send everything to the same expensive model. Routing simple queries to cheaper models automatically makes a bigger dent. That's what I'm building with TokenSurf — it's a proxy, you swap one URL and it handles the rest.

2

0

35

Cembas retweeted

Cem Bas

@Cembas

about 2 months ago

Cost-driven LLM routing? Yes please! :D

0

2

1

0

37

Cem Bas

@Cembas

about 2 months ago

Yeah this is exactly the problem I've been working on. Most of those messages don't need GPT-4 or Opus — a cheaper model handles them fine. I built TokenSurf to auto-route simple queries to cheaper models so you only pay for the heavy lifting when you actually need it. One URL swap, no code changes. Been seeing 50-90% savings depending on the use case.

0

21