I agree that resolving a ticket decouples input cost and output value. I do not think that repackaging tokens as compute units does so. Obviously it does give you more discretion to slip margin in, but you’re fundamentally still using a cost yardstick, albeit one that’s more flexible in your favor.
Sierra charges when an agent resolves a ticket, zero for failures.
Devin sells Agent Compute Units, not tokens — the same abstraction Databricks & Snowflake use with credits to decouple pricing from raw compute.
Margin is decoupled from the inference line. Durable.
@paulg It's amazing how human some of the old attempts to pass the Turing test are way better than the best LLMs in terms of non-boringness (even if they can get a little weird)
I’ve been wondering if we need a new notation for AI.
Big-O describes how computation scales.
But what we’re seeing now is something different: how much agency we’re willing to hand over.
Call it Big-A.
A(1):
“Write this function.”
A(n):
“Execute this workflow.”
A(n²):
“Keep working until the tests pass.”
A(n³):
“Debate another model until you’re both happy with the result.”
A(n⁴):
“Build this product.”
A(∞?):
“Achieve this outcome.”
The point isn’t that these are mathematically correct.
The point is that every time AI gets cheaper, we don’t seem to do the same work for less money.
We move up a level of abstraction and ask for something bigger.
A prompt becomes a workflow.
A workflow becomes an agent.
An agent becomes a team.
A team becomes a project.
A project becomes an outcome.
Which makes me wonder whether Jevons Paradox has any natural limit in AI.
Do we eventually run out of higher-order loops to automate?
Or does Big-A just keep increasing forever?
@kakashiii111 free credits are a sugar high. they delay the moment you have to ask why the spend is what it is, they don't answer it. the teams that actually flatten the bill aren't the ones with the biggest credit line, they're the ones who can see and cap usage.
@dexhorthy "just build more loops" is the same instinct that shows up in the finance review as a 5-figure inference bill nobody can explain. the loop that doesn't read its own output burns tokens and trust at the same rate.
@kimmonismus the tell is they're building an "AI Gateway to track spend and impose token budgets" in-house. every big co is quietly arriving at the same place: the cost problem was never the model price, it's that nobody could see or cap usage. a control problem wearing a procurement costume.
@emollick Yes, the value is what you do along the way rather than the end result. Whereas in coding if you just magically write the answer and it’s right, who cares what you did to get there
A simulation environment is hard to build but game changing when it comes to ability to iterate
If you don’t have one, you’ll be constantly holding customers’ hands while they press the “on” button on agentic stuff
Curious, is anyone building a comprehensive harness for this as a standalone product?
@scottastevenson Do-maxxing is a suboptimal choice when everything that can be done soon _will_ be done. The same thing done tomorrow will be half the price of doing it today.
It is the era of wait-maxxing
@theo This is very task dependent. There are a lot of simple tasks - especially non-programming - that GLM will burn similar numbers of tokens on and hence be way cheaper.