Skipnick

Verified account

@skipnickk

vibecoded

X

Joined October 2024

32 Following

31 Followers

159 Posts

Pinned Tweet

9 days ago

If you're running Hermes on the Anthropic API directly, you're probably overpaying. Here's the actual math. Most developers assume the raw API is the serious move - more control, more flexibility, cheaper at scale. For Claude models, that assumption breaks pretty fast. → The tokens cost the same Copilot doesn't mark up Claude. Prices are identical in both places: - Sonnet 4.6: $3 input / $15 output per million tokens - Opus 4.8: $5 input / $25 output per million tokens → Copilot Pro+ packs $70 of credits into a $39 plan Every month, Pro+ gives you 7,000 AI credits - that's $70 of token budget baked into the base price. You're paying $39 for something worth $70 before you write a single prompt. The direct API gives you nothing upfront. Every token costs money from dollar zero. → What that looks like in practice Say you spend $100/month on Opus (heavy agentic sessions, long context reasoning - exactly what Hermes runs): - Direct API: $100 - Copilot Pro+: $39 base + $30 overage = $69 At $200/month: - Direct API: $200 - Copilot Pro+: $39 + $130 = $169 The gap is always $31. It doesn't close until your monthly usage consistently clears $300+. → The part most people miss: unlimited completions In the raw API, every autocomplete token costs money. In Copilot, completions are unlimited and don't touch your credits at all. → When the API actually wins - Your Opus spend is consistently above $300/month - You need the full 1M context window (Copilot caps Opus at 192k) - You're running batch workloads and want the 50% batch discount → The call Hermes connects to Copilot natively. Point it at Pro+ instead of the raw API - same models, same token prices, $31 cheaper every month before you even think about unlimited completions. Pro ($10) works if Sonnet is enough. Go Pro+ if Opus is in your stack. The API isn't wrong. It just requires you to outspend the included credits before it becomes the cheaper option - and for most Hermes users, that doesn't happen.

skipnickk's tweet photo. If you're running Hermes on the Anthropic API directly, you're probably overpaying. Here's the actual math.

Most developers assume the raw API is the serious move - more control, more flexibility, cheaper at scale. For Claude models, that assumption breaks pretty fast.

→ The tokens cost the same

Copilot doesn't mark up Claude. Prices are identical in both places:
- Sonnet 4.6: $3 input / $15 output per million tokens
- Opus 4.8: $5 input / $25 output per million tokens

→ Copilot Pro+ packs $70 of credits into a $39 plan

Every month, Pro+ gives you 7,000 AI credits - that's $70 of token budget baked into the base price. You're paying $39 for something worth $70 before you write a single prompt.

The direct API gives you nothing upfront. Every token costs money from dollar zero.

→ What that looks like in practice

Say you spend $100/month on Opus (heavy agentic sessions, long context reasoning - exactly what Hermes runs):

- Direct API: $100
- Copilot Pro+: $39 base + $30 overage = $69

At $200/month:

- Direct API: $200
- Copilot Pro+: $39 + $130 = $169

The gap is always $31. It doesn't close until your monthly usage consistently clears $300+.

→ The part most people miss: unlimited completions

In the raw API, every autocomplete token costs money. In Copilot, completions are unlimited and don't touch your credits at all.

→ When the API actually wins

- Your Opus spend is consistently above $300/month
- You need the full 1M context window (Copilot caps Opus at 192k)
- You're running batch workloads and want the 50% batch discount

→ The call

Hermes connects to Copilot natively. Point it at Pro+ instead of the raw API - same models, same token prices, $31 cheaper every month before you even think about unlimited completions.

Pro ($10) works if Sonnet is enough. Go Pro+ if Opus is in your stack.

The API isn't wrong. It just requires you to outspend the included credits before it becomes the cheaper option - and for most Hermes users, that doesn't happen.

0

3

0

0

410

27 minutes ago

@trikcode Which benchmarks, specifically?

0

1

0

0

12

33 minutes ago

@Tech_girlll fair, what kind of research, deep dives or just quick fact checks?

0

1

0

0

1

about 1 hour ago

@neamtuz @jpschroeder

skipnickk's tweet photo. @neamtuz @jpschroeder https://t.co/7SyPav4s0C

0

1

0

0

11

about 1 hour ago

@jpschroeder well that theory's dead

0

1

0

0

117

about 3 hours ago

I found a browser tab simulating 1 million particles. He built it with WebGPU compute shaders, Three.js and curl noise for the motion. Everything runs live on your own GPU, no server round trip, no rendered video underneath. A few years ago this was native or game engine territory. Now it is a browser demo built on curl noise and TSL. No idea how it holds up on a weaker GPU, but in the recording it is buttery smooth. Browsers really are turning into game engines now, one compute shader at a time.

0

1

0

0

14

about 4 hours ago

@leerob Ran into this building a scoring pipeline meant to teach a model actual taste. Below the top tier, judgment just collapses, creative writing is that same wall at its hardest.

0

1

0

0

12

about 5 hours ago

@droidbuilds that number stopped meaning anything after the first trillion😳

0

1

0

0

36

about 6 hours ago

@kimmonismus That 99% never notice because basic Q&A hit its ceiling versions ago, the real leap only shows once you hand it something with actual steps.

0

2

0

0

133

about 6 hours ago

@Pirat_Nation like meta couldn't get enough ddr5 and jury rigged the old stuff back in

0

1

0

0

38

about 7 hours ago

https://t.co/DfKmO1ZcEI

Ethan @CosmicBoogaloo

3 days ago

This exploded into a full on mini project. Pivoted from music to fun stuff like interactive fx, games, etc Cast spells, draw, shoot a finger gun, play fruit ninja and so much more with your webcam at my website https://t.co/dF0byCG96J #indiedev #buildinpublic #solodeveloper

0

4

0

0

143

0

1

0

0

20

about 7 hours ago

This guy turned his webcam into a game controller. MotionFX reads your hand movement in the browser and turns it into spells, a finger gun, fruit slicing, even a virtual keyboard you tap in midair. Zero hardware, just a camera and apparently too much free time.

1

1

0

0

33

1 day ago

@cjzafir Ran something similar, scoring virality with a different model per stage. Below a certain tier the taste collapses, so this formula idea tracks.

0

1

0

0

198

1 day ago

@JonhernandezIA This isn't just a pharma thing, any vertical handing an ai vendor its whole workflow is basically funding that vendor's entry into it

0

1

0

0

61

1 day ago

@petergostev no wonder the job panic doesn't land, the free tier is doing the pr for the whole industry

0

2

0

1

256

1 day ago

@kimmonismus nothing sells Pro like a rival's bad week

0

2

0

0

38

skipnickk retweeted

3 days ago

AI memory shouldn't live in someone else's database. OpenKnowledge is an open source markdown editor that lets Claude, Cursor, Codex and OpenCode read, search, edit and fix your notes directly through MCP and skills, instead of just chatting about them. It is basically three layers stacked on each other: → a clean WYSIWYG editor for humans on top → agent tools, MCP plus skills, in the middle → plain markdown or mdx files underneath, kept in git The source of truth stays markdown, agents just get real tools to work with it instead of a chat box bolted on the side. That bottom layer is the part worth paying attention to. Point it at an existing folder, a codebase, a wiki, an Obsidian vault, and it opens right up. Nothing gets trapped in a proprietary format. Git and GitHub handle sync and sharing, it is local first and private by default, and the project itself is open source. Your knowledge outlives the app. On the agent side this goes past a sidebar chatbot: → agentic search over embeddings with hierarchical RAG → agents can co-author docs directly instead of handing you text to paste in → a built-in TUI for Claude and Codex in the desktop app, so you are not constantly tabbing to a terminal For the human side it still feels like Notion: tables, images, code, Mermaid diagrams, LaTeX, PDFs, video, interactive HTML and JS embeds, plus a graph view for wiki links. Worth being honest about where it actually is: → early and moving fast, north of 800 commits and 227 tags behind about 1.7k stars already, a fast growing foundation, not a finished product → macOS app plus a local web app and CLI for Windows, Linux and Intel Mac, no native Windows build yet → MCP plus skills make search, editing and upkeep noticeably better, they do not mean an agent understands your entire vault on day one Still, the framing holds up: Notion plus VS Code for agents. A markdown editor where Claude, Cursor and Codex can read, write and maintain your knowledge base while everything underneath stays plain markdown in git you actually own. gh - inkeep/open-knowledge

6

5

1

0

365

1 day ago

@anthod0 yeah, and it's not just context, it's the actual session and cookies a fresh cloud browser throws away every time

0

1

0

0

17

2 days ago

Everyone is building agent runtimes in the cloud. The best one might already be open in 40 of your tabs. Peerd is a browser-native AI agent harness. Not an AI browser, not a headless cloud browser, not an MCP gateway, not a terminal agent. It is a Chrome/Firefox extension that runs the agent inside the normal browser you already use. The pitch that stuck with me: the browser is already a runtime. Tabs, DOM, real sessions and cookies, WebCrypto, WebRTC, WebAssembly, WebGPU, sandboxing. We keep rebuilding all of that in the cloud and calling it agent infra. Peerd just uses what is already sitting on a billion machines. What it can actually do inside a tab: - act on your real tabs and session state, not a fresh throwaway browser - spin up local sandboxes - write and run JS notebooks - boot Linux VMs through WebAssembly - build client-side apps right in the tab - talk peer to peer over WebRTC for agent to agent workflows BYOK, no backend, no telemetry. Works with Anthropic, OpenRouter, Ollama, and there is an experimental WebGPU Gemma path. The part I respect most is the security framing. The author builds around the lethal trifecta: private data, untrusted web content, outbound network. The main agent never ingests raw untrusted DOM. Separate keyless runners read pages, their output gets wrapped as untrusted, egress goes through one chokepoint with a denylist, keys live in a local vault. That is architecture level safety, not "we added a prompt filter." Caveats, because it is early: 0.x experimental, main install path is GitHub or an unpacked extension, the Linux on WASM layer leans on Cheerpx which is not fully open and has commercial strings. Security claims are design intent, not an audited result. And BYOK is not "all local", if you point it at Anthropic or OpenRouter your prompts still leave the machine. Still, the core idea is the interesting part. Maybe we never needed a whole new AI browser. Maybe the agent just needed to live inside the one we already have.

2

3

0

0

144

1 day ago

@om_patel5 the token speedometer is the best detail

0

2

0

0

62

Last Seen Users on Sotwe

Trends for you

Most Popular Users