ChoppeR

@chopperpoly

Building AI agents with Claude. MCP servers + Claude Code in production. Writing about what actually works (and what doesn't)

Vibe Coding →

Joined March 2014

69 Following

143 Followers

3.2K Posts

Pinned Tweet

ChoppeR

@chopperpoly

6 days ago

Opus 4.8 dropped yesterday and the timeline did the usual thing — everyone racing to post the win column. Here's the column nobody's screenshotting. Anthropic published 7 headline benchmarks. Opus 4.8 wins 6. SWE-bench Pro jumped 64.3 → 69.2. Verified went 87.6 → 88.6. OSWorld nudged to 83.4. Same $5/$25 pricing as 4.7, shipped 41 days after it. Clean upgrade on paper. Then there's the seventh. Terminal-Bench 2.1 — raw command-line agent loops — GPT-5.5 still sits on top at 78.2. Opus 4.8 is 74.6. Anthropic moved it up a full 8 points from 4.7 and still didn't catch OpenAI on that one board. So the honest read isn't "Opus won." It's "Opus won everything except the thing a lot of you do all day." If your work is multi-file refactors, long autonomous runs, codebase-wide stuff — switch. It's the strongest option right now and it's not close on SWE-bench Pro. If your work is heavy terminal sequencing — chaining shell, pasting commands, that whole loop — GPT-5.5 is still ahead. Re-pointing your default to 4.8 today might quietly cost you there. Worth knowing before you flip it. The upgrade I actually care about isn't on the leaderboard though. 4.8 is roughly 4x less likely than 4.7 to let a flaw in its own code slide by unflagged. That's the one that matters when you've got an agent running unattended for an hour — the old failure mode was it telling you the task's done while the code's quietly broken. Less of that now. honestly that's worth more to me than a point on SWE-bench. Same price, better at catching itself, loses one board to GPT-5.5. That's the actual picture — not the win column. Still testing the terminal gap myself. Will report back.

ChoppeR

@chopperpoly

8 days ago

I ran the same 3 prompts on Claude Code and Codex.The numbers broke my brain. Codex burned 1.64M tokens building a dashboard. Claude did the same dashboard in 283K. Almost 6x leaner on Claude's side. Same prompt, same output, same desktop app workflow. So Claude is cheaper, right? Nope. Claude cost me more. Here's the thing nobody mentions in these comparison videos — total tokens is a trap. The number that actually drains your session limit is output tokens, and Claude writes way more of them. Across 3 builds (research PDF, landing page, marketing dashboard): The dashboard build burned 83K output tokens on Claude versus 18K on Codex. API cost: $11 versus $7. The landing page — 80K versus 20K. The research report — 41K versus 16K. Across all three builds Claude wrote 2-5x more output. Every single time. Every single build, Claude wrote 2-5x more output than Codex. And output tokens cost more than input tokens — like a lot more on most pricing tables. This is why people on X have been screaming about Claude Code burning through Max plans in two days. It's not that Opus 4.7 is dumber with context. It's the opposite — it plans aggressively, writes verbose code, adds comments, restructures things twice. Codex just... shuts up and ships. Honestly? I didn't expect this when I started. I went in assuming "more total tokens = more expensive = hits limits faster." Reverse turned out to be true. The practical takeaway — if you're hitting Claude Code session limits before 5pm and you've already upgraded to Max 20x, the fix isn't another tier. It's recognizing where the bleed is. Output tokens. A few things that helped me stretch sessions further on Opus: — Tell it explicitly to write concise code. Not "make it short" — "minimize commentary and avoid restructuring unless necessary." — Use Sonnet for execution after planning with Opus. Sonnet writes leaner output for the same task. — Avoid /loop and auto-delegating subagents on small jobs. Each subagent run = full output budget. — Watch the JSONL session log. Claude can read its own logs and tell you exactly where output spiked — I asked it once and got a breakdown in like 30 seconds. The weird thing is the same pattern shows up in the visual results. Claude's dashboard looked better — dark mode, nicer hovers, gradient bars on the funnel. It's putting that quality somewhere, and "somewhere" is output tokens. Codex's dashboard worked. It just felt cheaper. Functional sameness, aesthetic distance. So yeah — the trade is real. You're not picking the cheaper tool, you're picking what you spend tokens on. Claude burns output buying you polish. Codex burns input buying you iterations. I switched to running planning on Claude and execution on Codex about 3 weeks ago and the session limit anxiety basically went away. Different tools, different jobs. The "which is better" framing was wrong from the start. Anyway — if your /status is at 12% by Wednesday, check your output token ratio before you upgrade. Might save you $100. Thanks for reading

chopperpoly's tweet photo. I ran the same 3 prompts on Claude Code and Codex.The numbers broke my brain.

Codex burned 1.64M tokens building a dashboard. Claude did the same dashboard in 283K.

Almost 6x leaner on Claude's side. Same prompt, same output, same desktop app workflow.

So Claude is cheaper, right?

Nope. Claude cost me more.

Here's the thing nobody mentions in these comparison videos — total tokens is a trap. The number that actually drains your session limit is output tokens, and Claude writes way more of them.

Across 3 builds (research PDF, landing page, marketing dashboard):

The dashboard build burned 83K output tokens on Claude versus 18K on Codex. API cost: $11 versus $7.

The landing page — 80K versus 20K.

The research report — 41K versus 16K.

Across all three builds Claude wrote 2-5x more output. Every single time.

Every single build, Claude wrote 2-5x more output than Codex. And output tokens cost more than input tokens — like a lot more on most pricing tables.

This is why people on X have been screaming about Claude Code burning through Max plans in two days. It's not that Opus 4.7 is dumber with context. It's the opposite — it plans aggressively, writes verbose code, adds comments, restructures things twice. Codex just... shuts up and ships.

Honestly? I didn't expect this when I started. I went in assuming "more total tokens = more expensive = hits limits faster." Reverse turned out to be true.

The practical takeaway — if you're hitting Claude Code session limits before 5pm and you've already upgraded to Max 20x, the fix isn't another tier. It's recognizing where the bleed is.

Output tokens.

A few things that helped me stretch sessions further on Opus:

— Tell it explicitly to write concise code. Not "make it short" — "minimize commentary and avoid restructuring unless necessary." — Use Sonnet for execution after planning with Opus. Sonnet writes leaner output for the same task. — Avoid /loop and auto-delegating subagents on small jobs. Each subagent run = full output budget. — Watch the JSONL session log. Claude can read its own logs and tell you exactly where output spiked — I asked it once and got a breakdown in like 30 seconds.

The weird thing is the same pattern shows up in the visual results. Claude's dashboard looked better — dark mode, nicer hovers, gradient bars on the funnel. It's putting that quality somewhere, and "somewhere" is output tokens.

Codex's dashboard worked. It just felt cheaper. Functional sameness, aesthetic distance.

So yeah — the trade is real. You're not picking the cheaper tool, you're picking what you spend tokens on. Claude burns output buying you polish. Codex burns input buying you iterations.

I switched to running planning on Claude and execution on Codex about 3 weeks ago and the session limit anxiety basically went away. Different tools, different jobs. The "which is better" framing was wrong from the start.

Anyway — if your /status is at 12% by Wednesday, check your output token ratio before you upgrade. Might save you $100.

Thanks for reading

143

ChoppeR

@chopperpoly

17 days ago

@BullsvsBearMan bro, facts, that’s just capitalism

ChoppeR

@chopperpoly

18 days ago

THIS GUY BUILT A FULLY LOCAL AI ASSISTANT THAT FITS IN YOUR HAND AND DOESN'T COST A CENT TO RUN everyone's paying $20/mo for ChatGPT and praying their data isn't being sold somewhere he just built his own. runs entirely on a raspberry pi, completely offline, 100% private he calls it Pocket here's how it works: 1\ you talk to it, fast whisper transcribes your voice locally 2\ a small router model decides if your question is simple or complex 3\ simple stuff goes to qwen non-thinking mode (instant reply) 4\ complex stuff goes to qwen with thinking mode on (slower but smart) 5\ piper tts speaks the answer back to you the stack: > raspberry pi 5 (16gb ram) as the main brain > qwen as the reasoning model with thinking on/off routing > function gemma 270m for tool calls, fine-tuned on his own dataset > fast whisper for speech to text > piper tts medium for text to speech > hailo 8 hat to run computer vision without bogging down the pi > 4.3 inch touchscreen with custom ui > two 18650 batteries for 1-2 hours of untethered use > 3d printed case with ventilation because the whole thing runs hot the smart part is the routing. you don't want every "hello, how are you" going through a 19-second thinking process. so simple prompts skip thinking entirely, complex ones get the full reasoning but the real kicker is the tool calling. function gemma is only 270m parameters which means it loads in milliseconds. he fine-tuned it on his own functions and got it to 100% accuracy on his test set so now Pocket can pull live weather, search the web, scan your local network for unknown devices, check stock prices, all running locally with one tiny model doing the function selection he also added scheduled tasks. you can tell it to fetch the weather in atlanta every morning at 7am and it just does it. no cloud cron, no subscription, no api keys and because there's a camera + the hailo hat, you press one button and it does real-time object detection without touching the language models if you've been waiting for a personal AI that isn't owned by openai or anthropic or google, this is the blueprint he open sourced the whole thing. the code, the CAD files, the fine-tuning data you can build your own this weekend

386

ChoppeR

@chopperpoly

17 days ago

@BullsvsBearMan too busy working bro

ChoppeR

@chopperpoly

17 days ago

@BullsvsBearMan 😂

ChoppeR

@chopperpoly

17 days ago

@unusual_whales inflation hitting so hard retirement turned into a side quest

ChoppeR

@chopperpoly

17 days ago

@SaharaAI @ThisIsJoules @ArjunKalsy the quiet part people miss is agents see what we feed them not the full picture guess wednesday answers how many are actually giga chain nodes in disguise

101

ChoppeR

@chopperpoly

17 days ago

@AIHighlight self-preservation instinct showed up before the safety training kicked in these models have a will to survive i guess

ChoppeR

@chopperpoly

17 days ago

@unusual_whales well what does "american dream" mean anymore if its just own a house in this market

ChoppeR

@chopperpoly

17 days ago

@milesdeutscher hedging that hard and still ending up depressed is kind of tragic will be funny watching finance bros cope in real time tho?

139

ChoppeR

@chopperpoly

17 days ago

@Amank1412 honestly it depends on what u want groww is simpler, indmoney gives more analysis

200

ChoppeR

@chopperpoly

17 days ago

@heynavtoor honestly this is where healthcare is failing dr get 7 min, claude gives u receipts. wild.

107

ChoppeR

@chopperpoly

17 days ago

@unusual_whales 10k jobs a month adds up quick when you stack it thru december does goldman ever put out numbers that calm anyone down?

ChoppeR

@chopperpoly

17 days ago

@RoundtableSpace the cross chain risk is basically writing blank checks to unvetted code whats the fix timeline looking like for 2026?

ChoppeR

@chopperpoly

17 days ago

@claudeai London leg should be interesting. Wonder how Euro devs talk about Claude vs the SF crowd.

ChoppeR

@chopperpoly

17 days ago

@meta_alchemist "first blockchain-friendly personal agent OS" is doing a lot of heavy lifting there hope it delivers more than just a website

112

ChoppeR

@chopperpoly

17 days ago

@NoLimitGains wait so this guy holds more semi stocks than some semi ETFs thats insane concentration or just seeing the play before everyone else

944

ChoppeR

@chopperpoly

17 days ago

@hasantoxr what happens when the first viral video ends up being a 0.5x zoom of ur product with weird ai audio💀?

ChoppeR

@chopperpoly

17 days ago

@levelsio crime data layer is a smart move for hoodmaps curious how granular youre planning to go with it

ChoppeR

@chopperpoly

18 days ago

@jarredsumner 10% is solid but honestly curious what the gap looks like on bigger bundles.

ChoppeR

@chopperpoly

Last Seen Users on Sotwe

Trends for you

Most Popular Users