@theo you are just misinformed on this very basic point. you can run multiple sessions at the same time with little penalty to tok/s figures, because of batching.
only thing that scales linearly is the ram you need for kv cache
@aetxton@__tinygrad__@robinebers my point is that these are cheaper but they are not a real alternative for most people right now because most people run subsidized subscriptions.
@__tinygrad__@robinebers I mean it just confirm what we knew (open weight models will closely follow closed ones).
But all the "I have replaced chatgpt/claude with this" hype does not apply to the average joe.
@angelbrodin max and high are way off the diminishing returns side of the value curve.
mid and low are competitive with 5.5 and on the same perf/cost curve