Wild claim, but our evals back it up: for realistic human conversation, you don't need frontier compute to get frontier quality.
We used the current SOTA models of GPT-5.5 (Extra High) and Opus 4.8 (Max) to grade >400k simulated scenario replies against our internal evaluation framework. The winner for our use case? Gemini 3.1 Flash Lite. Post pipeline optimization, it achieved near previous generation SOTA (Opus 4.7) performance at ~1/300th the cost and ~100x the speed.
Why? It's a systems problem, not a model-size one. The attached diagram shows how it fits together: one streamed call for the live reply, while coaching, scoring, memory, and ending logic run async beside it. The offline loop turns every rehearsal into corpus → eval → tuning. That compounding practice is the real moat. You can easily swap an underlying model anytime, but you can't easily swap accumulated data.
We also incorporated privacy from day one: all conversations are de-identified & aggregate-only. No live users yet, but with the help of synthetic data, we have a data pipeline that's already sharpening itself.
Frontier quality is a systems bet.
#quippy #ai #llms #tech
AI Labs: Losing hundreds of billions of dollars a year building their models
Tech Twitter: SHOCKED when they start tightening up subscription limits and increasing prices
😵😲🫨🤯
#ai#llms#bubble
The age of the $200 / month AI model subscription really might be a temporal arbitrage moment. This will most certainly not be around forever, once a market leader is firmly established.
More people should capitalize on the opportunity.
Anthropic / OpenAI / Google are only offering this deal right now since the AI race is tight and is evolving so quickly. Even then, we're already seeing that start to change (Anthropic tightening up access to the Claude Agent SDK being one example)
This is the hourly usage graph for Quippy, if tokens were being paid for by API credits (this graph is tracking consumption of 3 different $200 a month plans). Yes, that's >$350 / hour of usage, being paid for by $600 / month (a month is 720 hours).
#ai #arbitrage
I think the biggest mistake that AI consumers are making right now is having company loyalty.
Getting too comfortable with one AI product, and not playing around with new ones that come out because you’ve written off @OpenAI / @AnthropicAI / @GoogleDeepMind / @grok / @AIatMeta as “trash” is costly, given that the SOTA leader is constantly shifting.
It makes sense to have company loyalty for a mature product like a smartphone, but that’s because the iteration cycles for them are much longer.
You’re missing out by not keeping an open mind.
#ai #LLMs #techtwitter
Is fast mode in claude code macos app not charging people's extra usage for anyone else??? Is this intended or am I secretly getting charged rn @ClaudeDevs@Claude@AnthropicAI