@lvwerra The review-and-merge loop is doing the heavy lifting here. Swarms stay coherent when every agent inherits the same contracts (scope, format, citation rules) and output lands through a gate, not straight into the wiki. Curious how much quality is harness vs models.
@jennapederson@addyosmani@aiDotEngineer This is the right question. In practice it means deciding per task, not per job: which steps the model runs, which a human owns, where the checkpoints sit. Accountability is a property of the workflow design, not a feeling after the fact.
@svpino Agents paying per use makes step-level accounting the natural unit. A spending ceiling caps the damage, but you still want each step to report what it consumed so you can see where the budget went. Protocol payments and per-step metering pair well.
@simonw The fun part of one-shotting a tool like this is that the generation is never the hard 20%. It's the harness: what the agent may touch, when it stops, what a run leaves behind. Boundary design is quietly becoming its own discipline.
@jerryjliu0@aiDotEngineer The runbook stage is more than a waypoint to goals. A typed, versioned runbook is an artifact: agents call it, humans audit it, and the how stops being re-derived every run. Goals cover the what; runbooks preserve the how that already works.
StewAI is live. Build an AI workflow once as typed steps, then run it from the visual builder or from three lines of Python. Every run shows what each step consumed in credits. Start free.
4/ The quadrant that matters most: unknown + recurring.
Start agentic to find the path, then freeze it into a typed, versioned recipe. Repeated exploration becomes reusable infrastructure.
Full writeup: https://t.co/c0Q6xUcJgF
1/ Before reaching for an autonomous agent, ask two questions: can you specify how the work is done, and will you run it again?
Those two answers decide whether you want an agent or a deterministic recipe. I mapped it to a 2x2:
3/ Why structure wins in production: end-to-end success is the product of the per-step rates.
At 85% reliability per step, a 10-step task succeeds about 1 in 5 times. Errors compound. Keep chains short, put deterministic logic between the probabilistic steps.
@diegocabezas01 Great setup. Next lever once it clicks: pin the orchestration itself. Right now the lead re-derives the delegation plan every run. Capture it as a fixed workflow with typed handoffs and you keep the flexibility but stop paying to re-plan, and the runs get reproducible.
@simonw This is the sneaky kind of cost shift: it lands silently and you only notice at the monthly bill. Good argument for metering per call, not just in aggregate. A 1.4x jump on English shows up right away when each step reports its own cost.
@svpino Treating the run as a stream of typed events is the right call. The part that's easy to underrate is durability: if each step has a typed output and the run can pause and resume as first-class state, human-in-the-loop stops being glue code and becomes just another step.
@geoffreylitt Understanding is the bottleneck partly because agents re-author their control flow every run. Pin the steps, give each a typed output, and you get a stable artifact to read instead of freshly improvised code each time. You understand it once, not per run.
4/ Rough numbers from my own runs (illustrative): the recipe path spends a fraction of what the open-ended agent burned, because the plan isn't regenerated every time.
Structure once. Run it cheap, forever.
https://t.co/QIMLOxvQy0
1/ For months my agent re-derived the same workflow on every call. Same plan, same tool order, tokens spent re-inventing structure it already had.
I moved the structure into a StewAI recipe. Now the agent just calls it.
3/ Then I read the trace:
trace = client.runs.steps(run["id"])
Every step reports usage.credits. I can see exactly which step cost what. No re-planning tax, no guessing where the tokens went.
5/ And a recipe nested 5 levels deep is still not a black box. Every nested step reports its own credits in the run trace, so the whole thing stays auditable.
https://t.co/QIMLOxvQy0
1/ I stopped rebuilding the same agent plumbing for every project. Now I package a workflow once as a StewAI recipe, and any other recipe can call it as a step. Build it once, reuse it everywhere.
4/ From code it's still 3 lines:
from stewai import Client
client = Client(api_key="...")
run = client.runs.create(recipe_id="01JEXAMPLE", inputs={"topic": "..."})
The recipe fans out to its subrecipes. I get one typed result back.