Always teaching this. Agents are fun - until Claude is dumb one day and does the wrong thing - or Opus 4.8 is more proactive than you would like. Go "on the rails" as much as possible.
Full agentic autonomy is seductive. But then you're three levels into a trace trying to work out why it skipped step 4.
The agents that I ship are a set of Python functions (tasks) on a schedule, reading from and writing to knowledge bases. I control these via knowledge bases too, updating goals and settings that are loaded in each execution.
No reasoning loop required. Balanced autonomy and determinism.
Same daily automation, built three ways. Which would you ship?
The job: every morning, pull and interpret tweets & details from a list of users and append each user's info under their section in a shared context file. This is baked into a grok prompt ("Tweet Pull").
1/ Two-prompt workflow:
Loop the user list. Run a Tweet Pull prompt for each. Collect the results, load the existing file, then hand the old file plus all the new tweets to a second "compiler" prompt that rewrites the whole thing with the new content merged in. Save it back. ~45k tokens per run, because the LLM rewrites the entire file every single day.
2/ One-prompt workflow:
Loop the user list, run Tweet Pull. Load the file. Second loop walks each user's results, renders a template, and uses Replace Text to swap a placeholder for the new block. Save it back. ~5k tokens. No LLM touches the assembly. It's string replacement dressed up as nodes.
3/ @aisle_sh task:
One script. Load the file, loop the users, call Tweet Pull for each, splice each result under the right header with text.find(). Write it back. Reads top to bottom like a recipe. Also ~5k tokens.
1 and 2 do the same job as 3. They just spread the logic across nine nodes, two loops, a Set Variable, a Template, and a Replace Text. Every time you debug, you get to reassemble it from the canvas in your head.
In the four months since Sonnet 4.6 shipped, the price of running a top-tier Claude model has gone from $3/$15 to $10/$50 per million tokens. That's more than triple.
Top-tier model releases are arriving faster and costing more, but for most business uses (content generation, summarization, extraction, RAG, classification), Sonnet 4.6 already handles the job, and has since February.
You could have picked Sonnet 4.6 four months ago, ignored every release since, and today you'd be running a model that handles those same tasks at a price that hasn't moved.
Done a few CTO trainings and the biggest topic for the last 2-3 months was token spend. They are not idiots and expect the subsidies to wind down at some point making the reality even harsher.
Sam Altman said AI budgeting has recently become a "huge issue" for some companies, something that "never came up" earlier this year. https://t.co/P2zODBNmDp
BREAKING: Anthropic has urged for a global pause in AI development as artificial-intelligence models are nearing capability to improve without human intervention, per WSJ
Introducing Aisle Tasks!
Build, deploy and run deterministic AI workflows in Python.
Write your Task, and Aisle handles the schedule, secrets, retries, versions, and audit trail for you.
Plus, our SDK covers all of your prompting, integrations, RAG, and more. Build a complex workflow in code, and let your team chat with the output.
Checkout the release here:
https://t.co/3RKj3wPR7y
Opus 4.8 is already live on @aisle_sh . Compare it to older models using our Playgrounds feature or just start chatting and prompting! https://t.co/j8QRHVvkoE
Re-posting my reasoning through MCP from Sept 2025. I still largely feel the same way. Building custom MCP servers you start to really feel flakiness - and a lot of the time with agents I end up just writing direct API proxies.
https://t.co/2uSAkd3qCf
"Do not write this to be quoted" has to be one of my bigger secrets to prompts that write solid prose. Lots of other rules but this kills the similar sentence structure and cadence in its tracks.