@smooth74m@LoganTGott Sure what harness do you use, I currently have an opencode and Claude code version of it.
But still need testing with our content team
Maybe we just don’t have good ideas anymore. You can’t lunch the 100th version of todo app and expect adoption even if it took you 2 seconds to build it.
Multi agent loop design pattern for the win.
The popular agents today have a single loop for managing interactions with a base model.
This simplifies the design but also means you use the same model to handle the very task.
Switch to a multi loop design and each loop can have its own model/system prompt
Multi agent loop design pattern for the win.
The popular agents today have a single loop for managing interactions with a base model.
This simplifies the design but also means you use the same model to handle the very task.
Switch to a multi loop design and each loop can have its own model/system prompt
One fascinating thing to me about /goals and dynamic workflows is how much Codex and Claude Code are starting to absorb the long-horizon features of OpenClaw and Hermes.
Also, some people are using OpenClaw and Hermes as coding agents over Slack, which is surprising to say the very least.
I was hoping OpenClaw and Hermes would become true personal agents.
The kind everyone, including grandma, can use.
That future still feels strangely underexplored.
Hot take: token costs are why there will be no saas apocalypse and why good dev tools are cached intelligence for agents!
The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints!
We just tested this and the data says the opposite.
We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch.
Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success).
And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong.
In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time.
Good tools are cached intelligence for agents!
So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive.
We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast!
https://t.co/Y7q6yuxZrZ
forward deployed engineers were basically what every engineer was in the 90s and 2000s.
normal people who could code, speak to customers, understand the business and ship the thing.
we had to recoin the term because the average dev became a cave goblin who hates meetings, users and sunlight.
happy to see fdes making a comeback
@ai_ops_lead@AnthropicAI I think the whole idea is code generation is a bad metric to optimize for. You are rightly calling out the limit of feature as a metric as well. The gist is let’s find a better metric to measure engineering output!
UX for interfacing with coding agent is a big area to improve on.
Designing skills so users can easily interface with agent will unlock a lot of values for many teams.
Promoting from scratch is crazy especially when the set of activities repeat regularly
This is a very useful breakdown for what a harness provides.
It seems to me that a good harness is akin to a good runtime:
• A good runtime allows you to interface with the OS.
• A good harness lets your application interface with the model.
...without you reimplementing the plumbing each time.
In that sense, all agents will have some kind of harness powering it.
Honestly, if there is one lesson to take away from /workflow, it is this:
A system of smaller agentic loops, each focused on a well-designed task, beats one all-knowing loop trying to do everything.
@dexhorthy keeps saying: own your control flow.
This might be the biggest ad for that idea.
"Mental Model: An all-knowing AGI Agent is really a perfect, just-in-time workflow generator & executor."
was messing around with this idea late last year and
Anthropic's Dynamic Workflows "feel" like the first implementation of the mental model where the models are intelligent enough to take advantage of this problem decomposition strategy (maybe possible since January)
dynamic workflows
- just-in-time decompose complex problems into workflow primitives via code gen
- assign large amounts of compute to solve sub-problems
- BUT adaptively alter the execution plan for the workflow based on learnings from sub-executions
imo AGI is just doing this flow perfectly including any exploration and verification steps. Generating & execute the right workflow for any input task, across any time horizon
design primitives like dynamic workflows & /goal feel like exciting sparks of the generalizable problem solving machine where the UX maps onto how humans want to interact with AI
even if the exact implementation today may not be "the one" and may even often look like slop...
the trajectory feels correct 🚀
@scottastevenson@Alfred_Lin An approach to consider is each task gets it own agent loop with its own prompts and model, instead of one agent loop for all.
You can do this very well with @boundaryML
Assuming every task in an application or a workflow is the same is wrong.
I remember when you could not open this app without another on-demand software take slapping you in the face.
Everyone will get custom software written for them the moment they want it they said!
the vision of personal-software i believe in looks more like this.
instead of making your own software (which is kind of like building your own house), find one that is tailored to exactly what you want.