@cobi_bean The GBs aren't the issue, it's scary having software you don't need on your main device. Every dependency adds risk, and I don't need another copy of node and python and whatever else is in the 3gb of code sitting there
@UOSJoe@cobi_bean I trust it will, I switched to Hermes two days after release and the pace of updates has been mind-blowing since. But on this, I'd rather wait a bit @NousResearch
In my main Mac, I try to install as few dependencies as possible for security and hygiene. If I want the app as just a client to my Hermes running remote, having to install a runtime and 3gb of dependencies is a huge smell.
I'm not mistrusting @NousResearch - but it's definitely a deal breaker on principle
One of the biggest unlocks in my personal-agent setup has been treating documentation & specs as a core part of the system.
Everything starts as a conversation with Klaw.
I describe what I want, Klaw pushes back, we go back and forth, and eventually we converge on a spec. Not a vague prompt — an actual written spec with behavior, edge cases, open questions, implementation notes, and success criteria.
Only after that do we launch a Claude coding agent to implement it.
The coding agent isn’t just told “write the code.” It’s told that the spec is the source of truth, and that every meaningful code change has to update the relevant documentation at the same time.
So if the implementation changes the behavior, the spec changes.
If a new edge case appears, the spec changes.
If the architecture shifts, the docs change.
Then those specs/docs are mirrored into my Mind project, so they’re part of the same knowledge system I use for everything else — not buried in a repo, Slack thread, or random agent transcript.
The last piece is enforcement.
Background jobs check that:
- every project has linked docs
- specs link back to the relevant code
- docs are current with what actually shipped
- implementation drift gets surfaced instead of silently accumulating
Coding agents are useful, but without a documentation loop they create entropy extremely quickly. They can generate code faster than you can remember why it exists.
Our system is one where conversation becomes spec, spec becomes implementation, implementation updates the spec, and the docs stay alive afterward.
Most personal-agent memory systems should be an index, not a warehouse.
I wanted Klaw to have a real memory layer. The typical approach with OpenClaw or Hermes was to stuff everything into AGENT.md context files and let the model figure it out.
But once the files grow, having all this extra data in the context will cause the model to hallucinate.
Instead, I built a system I call Mind.
Mind is a Markdown knowledge base. Mind is where durable context lives: project notes, references, ideas, voice/writing guidance, decisions, summaries, and pointers. But source-of-truth data stays in the system that owns it. Email stays in email. Calendar stays calendar. Various internal CLIs own their databases.
The memory layer should know enough to route the request, but not become a shadow copy of my entire life.
The pattern is simple: natural language at the boundary; deterministic systems in the loop.
If I ask about an old workflow, Mind can tell Klaw which note or repo matters. If I ask about a real-world fact, the agent should query the owning system through code or a CLI, pull back the minimum relevant context, and show evidence when it matters.
The model is still useful. It decides what the request means, which sources are likely relevant, how to summarize the answer, and when to ask for approval.
But the model should not be the database.
This is especially important for personal agents because the data is private by default. More context is not always better. The right question is: what is the smallest safe slice of context that lets the agent do the job?
Oyster Ventures is now Utopian Ventures: https://t.co/GNPkI4L4Vi
We have completed a full rebrand of the firm ahead of launching our next fun. Lots of exciting developments in the pipeline.
Plain language and secular cliff notes:
- Pope Leo XIV released first encyclical: Magnifica humanitas, about safeguarding human dignity in the age of AI.
- Core frame: AI isn’t evil, but it’s never neutral — it reflects who builds, funds, regulates, and deploys it.
- Main warning: don’t build a new Tower of Babel — i.e. technical power detached from humility, limits, God/common good, and human dignity.
- Strongest AI point: “ethical AI” can’t just mean ethics defined by a few powerful companies/governments.
- He’s worried about concentration of power, algorithmic manipulation, worker degradation, surveillance, disinformation, and AI-enabled war.
- He argues AI should serve people, not make people adapt to machine speed, profit logic, or state/platform control.
- Interesting line: there is “no algorithm that can make war morally acceptable.”
- Not anti-AI. Anti-sovereign-AI, anti-monopoly-morality, anti-efficiency-as-religion. Useful secular paraphrase: AI should increase human agency, not turn humans into inputs.
The Holy Spirit challenges us today regarding our relationship with technology and the ongoing digital revolution. Technology has the power to heal, connect, educate and protect our common home; but it can also divide, exclude and generate new forms of injustice. #MagnificaHumanitas
Fully built by my agent, Klaw.
About 8h or so of agent thinking time w/ GPT 5.5 architecting and Claude executing, with my Hermes agent orchestrating everything and me directing.
I made a little game: Placeframe.
Guess the city/place from its architecture, streets, landscapes, and visual clues — then see how far off you were.
Kind of like Wordle / GeoGuessr, but a full game not daily.
Try it: https://t.co/8W2W8D5Jn3
I made a little game: Placeframe.
Guess the city/place from its architecture, streets, landscapes, and visual clues — then see how far off you were.
Kind of like Wordle / GeoGuessr, but a full game not daily.
Try it: https://t.co/8W2W8D5Jn3
One of Klaw’s primary jobs right now is processing my email.
That is a pretty good test case for local models.
Email is high volume, private, messy, and mostly boring. Most of the doesn't require high intelligence — it's classification: detecting urgency, extract the useful bits, deciding what needs review and what can be ignored.
So I built the system mostly as deterministic code that handles the classification step as a call to a local LLM (in this case, Qwen 3.6's 35B model). It's small enough to run on my desktop, and smart enough to do a perfect job of the task. It can process 10s of thousands of messages per day; and it doesn't expose any of my private data any outside parties.
Everything is run by code, written by a frontier model, with deterministic outcomes. The local model only handles the narrow judgment call: what kind of email is this?
Frontier models are still useful for hard reasoning, drafting, ambiguity, and synthesis. But a lot of agent work is high-volume background judgment where privacy and cost matter more than brilliance.
Local models are underrated because people keep comparing them to frontier models' most advanced capabilities.
They don't need to beat frontier models. They need to be cheap, private, fast enough, and good enough to run continuously inside a system that is mostly code.
The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access.
The default agent pattern still feels like:
Write a big prompt.
Give the model a pile of tools.
Hope it reasons through the workflow correctly every time.
That works for demos. It’s not a great foundation for anything you want running every day.
For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known.
A script.
A CLI.
A queue worker.
A typed adapter.
A deterministic parser.
A database query.
A narrow classifier.
A job with logs, retries, validation, and boring failure modes.
Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths.
This sounds less magical. I think it’s much closer to how useful personal agents actually work.
Code gathers the data.
Code validates it.
Code computes the numbers.
Code checks source-of-truth state.
Code handles retries and side effects.
Then, when needed, the model gets a narrow job:
“Summarize this.”
“Classify this into one of these categories.”
“Explain these options.”
“Draft the reply, using this evidence.”
“Choose the next step from this list.”
That split matters.
If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures.
If code computes the answer and the LLM explains it, the system is much easier to trust.
Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive.
The job of the agent is not to be clever at every step.
The job is to know which parts should be deterministic and which parts need judgment.
This also changes how “memory” works.
A prompt-first agent wants to stuff more context into the model.
A code-first agent asks:
Where is the source of truth?
What query should retrieve it?
What is the minimum useful context?
What evidence should be attached to the result?
What should be logged so we can debug this later?
That is a very different product.
It’s cheaper.
It’s faster.
It’s easier to test.
It’s easier to audit.
It fails in more obvious ways.
And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt.
Natural language still matters. A lot.
The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened.
But once a pattern repeats, it should graduate out of prompt-land and into code.
That’s where agents start becoming infrastructure instead of chat sessions.
And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it.
The interface does not have to be a chatbot.
It can be chat.
It can be a mobile app.
It can be a dashboard.
It can be a button.
It can be a background job that just does the thing.
All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense.
That’s the part I keep coming back to.
The best personal agents will probably feel conversational at the edge and boring underneath.
And once they’re boring underneath, they stop being just agents.
They become a way to build software.
Have used WeWeb since launch, and am an investor in the company. Great product—for most people who can't code using Claude etc gets you most of the rest there but then you're stuck. Here you can get 100% of the way there, letting AI design your site but having a full visual editor to tweak the result.
Big day today, @weweb_io is live on Product Hunt!
We built advanced AI capabilities into a powerful no-code web app builder. The result? The best tool for non-technical users to build their apps.
Check out the launch below. We’d love to get your support ❤️
And yes, there’s a special launch offer for the PH community: 20% off any plan 🤗
https://t.co/RiqBjv4p6B
Codex GPT 5.5 as a coding model is basically as good as Claude these days, bit Claude has a much better harness.
What drives me crazy with both is when it constantly blocks everything I try to do for "cyber risk" — it's getting dystopian to have big brother deciding what I'm allowed to code.
@krunkosaurus Yup that's usually the issue. I have a shortcut to disable and enable DNS override for this very reason.
But I might've just tethered on my phone these days, easier than dealing with these frustrating hotel wifi.
I wanted to build a hotel booking search system.
The obvious version is: give Claude a browser, a few APIs, web search, maybe some credentials, and ask it to figure things out. It runs searches, opens tabs, compares results, retries when pages break, and eventually comes back with an answer.
I don't like that architecture at all.
The way I think about this is different: the agent shouldn’t be “doing travel search.” The agent should be using a travel search system.
So the durable part is code.
There’s a CLI for search jobs. A hotel search becomes a structured job: destination, dates, constraints, loyalty programs, required filters, output format. The system fans out across providers in parallel, normalizes results, validates invariants, records logs, and produces a reviewable output.
No language model in the search loop. No vibes-based browsing. No “the agent clicked around and thinks this is the best option.”
The LLM’s job is at the boundary:
- understand the natural-language request
- pull missing details from context when appropriate (e.g. "search hotels for my next trip" — it knows what my next trip is)
- translate that into a precise query
- call the deterministic system
- summarize the result for me
That’s the pattern I keep coming back to.
Agents are great at intent, context, and judgment. Code is better at retrieval, validation, retries, parallelism, logging, and repeatability.
A lot of “agentic automation” becomes much more reliable once the agent stops being the worker and becomes the interface to systems that are built to do the work.
I've been playing with this lately and it indeed is the best way to do video gen right now.
My advice is to install the sogni creative agent skill into your Hermes/OpenClaw and use it from your own process — best UX vs using a bunch of website and copy pasting things all over.
The best AI storyboard-to-video workflow on the internet right now is GPT Image 2 → Seedance 2.0.
We just made it the default at https://t.co/CZ2PiVyjs9 - connecting with ByteDance and OpenAI to bring both into one chat-first creative engine.
Type an idea. Get a storyboard. Get cinematic video. Refine. Ship the ad concept. Minutes, not days.
Two things make Sogni different:
→ No subscription. Higgsfield bills you $29/mo whether you generate one clip or a hundred. Sogni is pay-per-generation on a people-powered consumer GPU network - frontier models when you need them, open-source models like LTX2.3 when you don't.
→ Same workflow, from your agent. Tell Claude, Codex, or Hermes to "make me a 6-shot storyboard and turn it into video" - the Sogni Creative Agent Skill runs the whole pipeline.
https://t.co/CZ2PiVyjs9 - try it free, no card.
One tidbit that exemplifies this is that my favorite new feature that Hermes released was no_agent cron jobs.
I have 50+ cron jobs and once they released that feature all but five were immediately switched to no_agent jobs
The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access.
The default agent pattern still feels like:
Write a big prompt.
Give the model a pile of tools.
Hope it reasons through the workflow correctly every time.
That works for demos. It’s not a great foundation for anything you want running every day.
For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known.
A script.
A CLI.
A queue worker.
A typed adapter.
A deterministic parser.
A database query.
A narrow classifier.
A job with logs, retries, validation, and boring failure modes.
Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths.
This sounds less magical. I think it’s much closer to how useful personal agents actually work.
Code gathers the data.
Code validates it.
Code computes the numbers.
Code checks source-of-truth state.
Code handles retries and side effects.
Then, when needed, the model gets a narrow job:
“Summarize this.”
“Classify this into one of these categories.”
“Explain these options.”
“Draft the reply, using this evidence.”
“Choose the next step from this list.”
That split matters.
If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures.
If code computes the answer and the LLM explains it, the system is much easier to trust.
Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive.
The job of the agent is not to be clever at every step.
The job is to know which parts should be deterministic and which parts need judgment.
This also changes how “memory” works.
A prompt-first agent wants to stuff more context into the model.
A code-first agent asks:
Where is the source of truth?
What query should retrieve it?
What is the minimum useful context?
What evidence should be attached to the result?
What should be logged so we can debug this later?
That is a very different product.
It’s cheaper.
It’s faster.
It’s easier to test.
It’s easier to audit.
It fails in more obvious ways.
And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt.
Natural language still matters. A lot.
The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened.
But once a pattern repeats, it should graduate out of prompt-land and into code.
That’s where agents start becoming infrastructure instead of chat sessions.
And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it.
The interface does not have to be a chatbot.
It can be chat.
It can be a mobile app.
It can be a dashboard.
It can be a button.
It can be a background job that just does the thing.
All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense.
That’s the part I keep coming back to.
The best personal agents will probably feel conversational at the edge and boring underneath.
And once they’re boring underneath, they stop being just agents.
They become a way to build software.
The more I build Klaw, the more I think durable agents are codebases with language interfaces, not prompts with tool access.
The default agent pattern still feels like:
Write a big prompt.
Give the model a pile of tools.
Hope it reasons through the workflow correctly every time.
That works for demos. It’s not a great foundation for anything you want running every day.
For repeatable work, the agent should not be rediscovering the procedure from scratch. It should be calling something known.
A script.
A CLI.
A queue worker.
A typed adapter.
A deterministic parser.
A database query.
A narrow classifier.
A job with logs, retries, validation, and boring failure modes.
Then the language model does the part it’s actually good at: summarizing messy inputs, drafting text, classifying ambiguous cases, ranking options, explaining results, or choosing between known paths.
This sounds less magical. I think it’s much closer to how useful personal agents actually work.
Code gathers the data.
Code validates it.
Code computes the numbers.
Code checks source-of-truth state.
Code handles retries and side effects.
Then, when needed, the model gets a narrow job:
“Summarize this.”
“Classify this into one of these categories.”
“Explain these options.”
“Draft the reply, using this evidence.”
“Choose the next step from this list.”
That split matters.
If an LLM is doing the math, checking the state, deciding which source of truth matters, and writing the final answer all in one big mushy pass, you’ll eventually get weird failures.
If code computes the answer and the LLM explains it, the system is much easier to trust.
Same for email, travel, finance, contacts, reminders, dashboards, approvals. Basically anything personal enough that being wrong is annoying or expensive.
The job of the agent is not to be clever at every step.
The job is to know which parts should be deterministic and which parts need judgment.
This also changes how “memory” works.
A prompt-first agent wants to stuff more context into the model.
A code-first agent asks:
Where is the source of truth?
What query should retrieve it?
What is the minimum useful context?
What evidence should be attached to the result?
What should be logged so we can debug this later?
That is a very different product.
It’s cheaper.
It’s faster.
It’s easier to test.
It’s easier to audit.
It fails in more obvious ways.
And when something breaks, you fix the primitive instead of rewriting vibes into a longer system prompt.
Natural language still matters. A lot.
The whole point is that I can ask Klaw for an outcome in plain English, and it can assemble context, choose the right workflow, run it, and explain what happened.
But once a pattern repeats, it should graduate out of prompt-land and into code.
That’s where agents start becoming infrastructure instead of chat sessions.
And there’s a second-order effect I think people underweight: once the agent is not just a pile of prompts, you can build real software on top of it.
The interface does not have to be a chatbot.
It can be chat.
It can be a mobile app.
It can be a dashboard.
It can be a button.
It can be a background job that just does the thing.
All the automations, permissions, adapters, databases, logs, and weird little connections between systems already exist underneath. The parts that need judgment can still call the LLM through the harness. But the product surface can be whatever makes sense.
That’s the part I keep coming back to.
The best personal agents will probably feel conversational at the edge and boring underneath.
And once they’re boring underneath, they stop being just agents.
They become a way to build software.
I've spent the last few months building the best AI agent imaginable.
It very much goes against conventional wisdom — the abstraction layers popularized here on X don't make that much sense to me.
I've developed a few components that deserve to be open sourced.
So starting today I'm going to start sharing a bit more about it.
Stay tuned 🦅