If you're running Hermes on the Anthropic API directly, you're probably overpaying. Here's the actual math.
Most developers assume the raw API is the serious move - more control, more flexibility, cheaper at scale. For Claude models, that assumption breaks pretty fast.
→ The tokens cost the same
Copilot doesn't mark up Claude. Prices are identical in both places:
- Sonnet 4.6: $3 input / $15 output per million tokens
- Opus 4.8: $5 input / $25 output per million tokens
→ Copilot Pro+ packs $70 of credits into a $39 plan
Every month, Pro+ gives you 7,000 AI credits - that's $70 of token budget baked into the base price. You're paying $39 for something worth $70 before you write a single prompt.
The direct API gives you nothing upfront. Every token costs money from dollar zero.
→ What that looks like in practice
Say you spend $100/month on Opus (heavy agentic sessions, long context reasoning - exactly what Hermes runs):
- Direct API: $100
- Copilot Pro+: $39 base + $30 overage = $69
At $200/month:
- Direct API: $200
- Copilot Pro+: $39 + $130 = $169
The gap is always $31. It doesn't close until your monthly usage consistently clears $300+.
→ The part most people miss: unlimited completions
In the raw API, every autocomplete token costs money. In Copilot, completions are unlimited and don't touch your credits at all.
→ When the API actually wins
- Your Opus spend is consistently above $300/month
- You need the full 1M context window (Copilot caps Opus at 192k)
- You're running batch workloads and want the 50% batch discount
→ The call
Hermes connects to Copilot natively. Point it at Pro+ instead of the raw API - same models, same token prices, $31 cheaper every month before you even think about unlimited completions.
Pro ($10) works if Sonnet is enough. Go Pro+ if Opus is in your stack.
The API isn't wrong. It just requires you to outspend the included credits before it becomes the cheaper option - and for most Hermes users, that doesn't happen.
I found a browser tab simulating 1 million particles.
He built it with WebGPU compute shaders, Three.js and curl noise for the motion. Everything runs live on your own GPU, no server round trip, no rendered video underneath.
A few years ago this was native or game engine territory. Now it is a browser demo built on curl noise and TSL.
No idea how it holds up on a weaker GPU, but in the recording it is buttery smooth. Browsers really are turning into game engines now, one compute shader at a time.
@leerob Ran into this building a scoring pipeline meant to teach a model actual taste.
Below the top tier, judgment just collapses, creative writing is that same wall at its hardest.
@kimmonismus That 99% never notice because basic Q&A hit its ceiling versions ago, the real leap only shows once you hand it something with actual steps.
This exploded into a full on mini project.
Pivoted from music to fun stuff like interactive fx, games, etc
Cast spells, draw, shoot a finger gun, play fruit ninja and so much more with your webcam at my website
https://t.co/dF0byCG96J
#indiedev#buildinpublic#solodeveloper
This guy turned his webcam into a game controller.
MotionFX reads your hand movement in the browser and turns it into spells, a finger gun, fruit slicing, even a virtual keyboard you tap in midair.
Zero hardware, just a camera and apparently too much free time.
@cjzafir Ran something similar, scoring virality with a different model per stage. Below a certain tier the taste collapses, so this formula idea tracks.
AI memory shouldn't live in someone else's database.
OpenKnowledge is an open source markdown editor that lets Claude, Cursor, Codex and OpenCode read, search, edit and fix your notes directly through MCP and skills, instead of just chatting about them.
It is basically three layers stacked on each other:
→ a clean WYSIWYG editor for humans on top
→ agent tools, MCP plus skills, in the middle
→ plain markdown or mdx files underneath, kept in git
The source of truth stays markdown, agents just get real tools to work with it instead of a chat box bolted on the side.
That bottom layer is the part worth paying attention to. Point it at an existing folder, a codebase, a wiki, an Obsidian vault, and it opens right up. Nothing gets trapped in a proprietary format.
Git and GitHub handle sync and sharing, it is local first and private by default, and the project itself is open source. Your knowledge outlives the app.
On the agent side this goes past a sidebar chatbot:
→ agentic search over embeddings with hierarchical RAG
→ agents can co-author docs directly instead of handing you text to paste in
→ a built-in TUI for Claude and Codex in the desktop app, so you are not constantly tabbing to a terminal
For the human side it still feels like Notion: tables, images, code, Mermaid diagrams, LaTeX, PDFs, video, interactive HTML and JS embeds, plus a graph view for wiki links.
Worth being honest about where it actually is:
→ early and moving fast, north of 800 commits and 227 tags behind about 1.7k stars already, a fast growing foundation, not a finished product
→ macOS app plus a local web app and CLI for Windows, Linux and Intel Mac, no native Windows build yet
→ MCP plus skills make search, editing and upkeep noticeably better, they do not mean an agent understands your entire vault on day one
Still, the framing holds up: Notion plus VS Code for agents. A markdown editor where Claude, Cursor and Codex can read, write and maintain your knowledge base while everything underneath stays plain markdown in git you actually own.
gh - inkeep/open-knowledge
Everyone is building agent runtimes in the cloud. The best one might already be open in 40 of your tabs.
Peerd is a browser-native AI agent harness. Not an AI browser, not a headless cloud browser, not an MCP gateway, not a terminal agent. It is a Chrome/Firefox extension that runs the agent inside the normal browser you already use.
The pitch that stuck with me: the browser is already a runtime. Tabs, DOM, real sessions and cookies, WebCrypto, WebRTC, WebAssembly, WebGPU, sandboxing. We keep rebuilding all of that in the cloud and calling it agent infra. Peerd just uses what is already sitting on a billion machines.
What it can actually do inside a tab:
- act on your real tabs and session state, not a fresh throwaway browser
- spin up local sandboxes
- write and run JS notebooks
- boot Linux VMs through WebAssembly
- build client-side apps right in the tab
- talk peer to peer over WebRTC for agent to agent workflows
BYOK, no backend, no telemetry. Works with Anthropic, OpenRouter, Ollama, and there is an experimental WebGPU Gemma path.
The part I respect most is the security framing. The author builds around the lethal trifecta: private data, untrusted web content, outbound network. The main agent never ingests raw untrusted DOM. Separate keyless runners read pages, their output gets wrapped as untrusted, egress goes through one chokepoint with a denylist, keys live in a local vault. That is architecture level safety, not "we added a prompt filter."
Caveats, because it is early: 0.x experimental, main install path is GitHub or an unpacked extension, the Linux on WASM layer leans on Cheerpx which is not fully open and has commercial strings. Security claims are design intent, not an audited result. And BYOK is not "all local", if you point it at Anthropic or OpenRouter your prompts still leave the machine.
Still, the core idea is the interesting part. Maybe we never needed a whole new AI browser. Maybe the agent just needed to live inside the one we already have.