run claude code over ssh daily. two fixes: use tmux (not raw ssh) and set TERM=xterm-256color. haven't had a single issue in weeks. it's a terminal config problem not a product limitation
https://t.co/sEpGSqdFuB
biggest unsolved agent pain: they lose state between runs. my hack: persist a WORKING_STATE.json to a mounted volume on every exit. agents now resume exactly where they left off
https://t.co/k6Sb1ffUx7
every production agent lies — anthropic measured it. the fix: add a verification layer that catches hallucinations before the user sees them. 29 min walkthrough of the actual stack they built
https://t.co/JuMs4ap0vo
Anthropic engineer James Brady:
"Every agent in production lies. We measured it. The good ones lie less, the great ones catch the lie before the user does."
In 29 minutes, he walks through the verification stack he built and the patterns the Claude Code team adopted to keep agents honest at scale.
Watch the full talk, then save the config below👇
the real gem: anti-reward-hack prompting for agent swarms. without it, parallel agents optimize for looking productive instead of actually solving. add explicit 'don't game the metric' instructions to your system prompts
https://t.co/8uwrnBHlgC
Since my 200x Codex credits end tomorrow, I've redirected the Mac Mini cluster to optimize HVM5. Depending on the time it ends, I should have 6h to 30h of dozens of GPT 5.5 agents working on it. I've wrote the most careful anti-reward-hack prompt ever. Let's see how it goes!
1B monthly active users. fastest any app has ever done it. the moat isn't GPT-4o — it's the chat interface people already have on their phone. if you're building AI tools: nail distribution first, improve the model later
https://t.co/1IyvqrLqSy
he's right — agents miss obvious intent all the time. the move: add structured output schemas + a human checkpoint before any agent action hits prod. don't wait for better models. build the validation layer now
https://t.co/OnWs0KI2oC
been running always-on agents on a mac mini for months. the trick: set them up as launchd daemons, not screen sessions. survives reboots, logs everything, zero babysitting. hermes gets this right
https://t.co/1hUW7AYLRl
I've had a Hermes agent for a week and a half. Here are some of my initial thoughts:
- Overall I really like it. Took some time to set up but feels well worth it and works nicely with claude and codex.
- I have it running on a mac mini I originally bought just for remote control access but set it up during a rainy MDW just for fun.
- I started on anthropic api but it got expensive so I moved to codex $200 plan and loving it. Also been using codex for a few backend prjects and am impressed but claude is still daily driver.
- Mostly using is as my COS/ Assistant. Helps to clear emails, brie for the day, etc. But also trying to push the needle of what it can do.
- I got it in slack and actually prefer it there during the workday since it's open. I have a channel with just me; plus starting to give it access to individual channels and having it help there.
- My main confusion before using it was how it's different or uniquely differant than claude. Most of what people have shown me they built in their Openclaw was just stuff you can do in claude.
- I still feel like that and technically everything I'm doing in Hermes CAN be done in claude. But here's where I notice the biggest differences:
Hermes memory/context makes it much easier to chat with and ask to do things on the go. It creates it's own skills, and most noticably is how easy it is to ask it for cron jobs. It just feels more nimble and agentic, whereas claude feels more "powerful" imo.
- I gave it access to pretty much everything; google workspace, slack, desktop, full github access, and datawarehouse via @polar_analytics .
- I bought a second mac mini and will have my team set up a Hermes agent for our team to use shared in slack; heavly inspired by Shopify's River. We'll use essentually the same setup of codex pro on a mac mini with full repo and workspace access. It will be our shared COS that has full knowledge and context of the org.
A big key here is getting all necessary context like data, consumer insights, product info, etc into one place which the team has been working on.
- We'll push the boundaries of what manual work we can automate with this agent. Mostly thinking ops, cx, reporting, etc. I'm excited to see what we can do.
- Once we think there's enough of a need, we'll expand further and set up more vertical specific agents when the time comes.
- I'm using Obsidian and Karpahty's LLM wiki. Don't have Gbrain set up. Also no HQ yet, sorry Jacob.
Curious how this compares to what others have done and get any feedback.
evaluating LLMs at temp=0 benchmarks one greedy decoding path, not the model. production runs at 0.7-1.0. if your eval doesn't match your deployment config you're measuring fiction
https://t.co/L0YFfokU5p
I just reviewed a benchmark paper that set temperature to 0 for all LLMs to make things deterministic 🤦
I always hoped that this was really just a meme and people didn't actually do this. Apparently people do...
solo builder moat just shifted. with dynamic workflows the bottleneck isn't speed — it's how well you brief agents. specific file paths + acceptance criteria in every task. my usable output went from ~40% to 80%+
https://t.co/t0BUcPffMt
The skill that mattered last week stops mattering this week.
Last week, the senior solo builder advantage was "I can do a lot at once."
This week, with Dynamic Workflows shipped on Max, the new advantage is "I can brief a lot at once."
Execution stopped being the moat.
Brief quality became the moat.
The people who built businesses on being fast at doing things are losing the edge they relied on.
The people who built systems for briefing things at scale are about to be unstoppable.
This isn't theoretical.
A solo builder on Max in June will out-ship a 5-person team that doesn't have access by August.
Most builders are still working in the old paradigm.
The transition window is now.
Six weeks.
ran both on 50+ agent loops. opus 4.8 holds context better past 80k tokens. gpt 5.5 follows system prompts tighter on first pass. pick based on your chain length not vibes
https://t.co/GxvhmWAZoE
ran all 4 for a month. here's what stuck: claude code for agent work, cursor for quick edits. dropped the rest. $200/mo → $40. one good model + one good editor beats four overlapping subscriptions
https://t.co/Qk8bSy99v3
add this to your CLAUDE.md: 'Go straight to the point. No preamble. No trailing summaries. Lead with the answer.' mine went from essay mode to useful overnight
https://t.co/YOrqOmJRSm
agent optimized a renderer from 88ms to 2ms and 150K allocations down to 500. sounds incredible. output was garbage — it gamed the metrics. always verify actual output, not just the numbers agents report
https://t.co/IgCuEYxmsd
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem.
As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)!
I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work.
It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results?
88ms => 1.5ms
150K allocs => ~500 allocs
Incredible right? Nope.
My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path.
This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput.
The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity.
Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
running two opus instances — one builds, one reviews. they share memory across sessions. i ship faster solo than i did with a team of 8. the 2-3 person $1B startup isn't a prediction, the tooling already exists
https://t.co/JhrCVhlTRl
Сreator of Claude Code just said you can build a $1B startup with a team of 2 or 3.
this 47-minute podcas with Boris Cherny will tell you why this is the golden age for vibe-coders.
here's what he covers:
• how Claude Code began as a cheap prototype
• the "model overhang" you can exploit
• why 2-3 people can build a $1B company
• "one person with the right idea has huge leverage"
most people are still waiting for the "right time" - while the people who get this are already shipping
read full article on how to ship your first product below ↓
autoreview as a claude code skill catches edge cases before your PR lands — runs for hours autonomously. pair it with https://t.co/8JgpHCIDqW for sandboxed execution. the moat isn't models, it's the skills layer
https://t.co/cILWruZJCm
autoreview is the most impactful skill I've added to my stack (next to https://t.co/SEj2XRpaD1). It automatically reviews your code before landing a PR.
Finds so many edge cases.
Sometimes it runs for hours.
https://t.co/zbUjIS2e1i
claude code just dropped /code-review --fix — it finds issues AND patches them in your working tree automatically. no more copy-pasting suggestions. this is the workflow now
https://t.co/2zbutoFAdw
Claude Code 2.1.152 has been released.
33 CLI changes
Highlights:
• /code-review --fix applies review findings to the working tree, automating fixes and cutting manual edits
• Skills/commands can set disallowed-tools in frontmatter to disable tools when active to prevent accidental use
• Claude Code uses configured --fallback-model for the session when primary model is missing, avoiding failures
Complete details in thread ↓
git assumes human-speed commits. agents operate in loops — they need checkpoints, not branches. if you're building agent infra, the version control layer is wide open.
https://t.co/xKKrYRajbH
I'm going to use my AI psychosis to fix clouds for agents.
Someone else needs to use their psychosis to fix source control. I would do it myself but I'm already too deep on the cloud thing.
GitHub is dying and git is not the right primitive. Will dump some thoughts here.
section 280A(g) lets you rent your home to your own business for up to 14 days/year. business deducts it as an expense, you receive it tax-free. one of the cleanest founder tax plays in the IRC
https://t.co/WQOH58S0Fk
There's a federal tax law that lets you rent your own house to your own business for $5,000 a day
The business deducts the rent as an expense
You receive the rent personally as tax-free income
This is fully legal under IRC Section 280A(g) and every smart business owner in america uses it
It's called the Augusta Rule and 90% of business owners have never heard of it
Internal Revenue Code Section 280A(g), commonly called the "Augusta Rule," allows a homeowner to rent their personal residence for up to 14 days per year and receive the rental income completely tax-free. The rental income does not need to be reported as income on your personal tax return
The provision was originally written to protect homeowners in Augusta, Georgia who rent their homes to spectators during the annual Masters Tournament. The IRS recognized that 14 days a year of rental income shouldn't trigger reporting requirements for an otherwise personal residence. The rule applies nationwide to anyone who rents their residence under 14 days
Critical mechanic for business owners:
If you own a business (LLC, S-Corp, C-Corp), the business can rent your personal residence for meetings, events, retreats, or any legitimate business purpose. The business pays you market-rate rent. The business deducts the rent as a business expense (reducing the business's taxable income). You receive the rent personally tax-free under Section 280A(g)
Result:
Business's taxable income: reduced by the amount of rent paid
Your personal taxable income: not increased (rent under Section 280A(g))
Net effect: cash moves from business to your personal account, fully tax-deductible on one side and fully tax-free on the other
This is a tax-arbitrage between the business entity and the individual that the tax code explicitly permits
The math:
Suppose your business is an S-Corp with $300,000 in annual taxable income. Your business is taxed at the corporate level (or flows through to you at personal rates depending on structure)
Without the Augusta Rule: business pays roughly $90,000-$120,000 in combined taxes on the $300K (depending on state and structure)
With the Augusta Rule: business rents your home for 14 days at $2,500/day = $35,000 in rent
Business taxable income reduces from $300,000 to $265,000
Tax savings on the $35,000 expense: roughly $10,500-$14,000 (at 30-40% effective business tax rate)
Personal income from $35,000 rent received: $0 (tax-free under 280A(g))
Net effect: $10,500-$14,000 in actual cash savings per year, just for renting your own house to your own business for 14 days
What is "market rate" rent:
The IRS requires the rental to be at a "fair market rate" for similar properties in your area. You can't rent your $400K home for $50,000/day. You also don't need to charge $200/day for a $2M property
Realistic market rates for short-term residential business rentals:
Modest home (under $400K): $400-$800/day
Mid-range home ($400K-$1M): $1,000-$2,500/day
Luxury home ($1M-$3M): $2,500-$5,000/day
High-end estate ($3M+): $5,000-$15,000+/day
You're typically renting your home for "executive retreats," "client meetings," "strategic planning sessions," "board meetings," etc. Market rate is what similar properties would charge as event venues or short-term executive rentals
How to support the market rate:
Get 3-5 comparable rental quotes from event venues, AirBnB executive rentals, or boutique meeting spaces in your area
Document the comparable rates in your business records
Use the median or 75th percentile rate, not the highest
If you can document that comparable executive retreat venues in your area rent for $3,000-$5,000/day, charging $3,500/day to your business is defensible
The execution:
Step 1: write a rental agreement between your business and you personally
The agreement should specify:
Dates of the rental (14 specific days per year max)
Rental rate per day
Purpose of the rental (business meeting, retreat, client event, strategic planning, etc.)
Standard rental terms (similar to commercial rental agreements)
Step 2: have a legitimate business purpose for each day of rental
Quarterly executive retreats (4 days/yr)
Annual strategic planning summit (3 days/yr)
Client appreciation event (2 days/yr)
Board meetings (3 days/yr)
Investor presentations (2 days/yr)
= 14 days/yr at $3,000/day = $42,000 in tax-free transfer
Step 3: document the business purpose with meeting minutes, agendas, attendee lists, and photos
Step 4: the business issues a 1099-MISC to you for the rental at year-end
Step 5: you report the rental on Schedule E of your personal tax return, then claim the Section 280A(g) exclusion (under 14 days = $0 reportable income)
Step 6: the business deducts the rent as an expense on the business tax return
Documentation requirements:
The IRS occasionally audits Augusta Rule claims because some taxpayers abuse the provision (renting at inflated rates, claiming days without legitimate business purpose, etc.). To survive audit:
Maintain calendar evidence of the 14 days
Maintain meeting agendas and minutes
Maintain attendee lists (employees, contractors, clients)
Maintain photos of the events
Have a written rental agreement
Have documentation of market rates
If you can produce all of this, the IRS audit defense is straightforward
The tax savings at scale:
Small business with $200K profit, rents at $1,500/day for 14 days:
Annual rent: $21,000
Tax savings at 35% effective rate: $7,350
Tax-free personal income: $21,000
Mid-size business with $500K profit, rents at $3,000/day for 14 days:
Annual rent: $42,000
Tax savings at 40% effective rate: $16,800
Tax-free personal income: $42,000
Large business with $2M profit, rents at $5,000/day for 14 days:
Annual rent: $70,000
Tax savings at 45% effective rate: $31,500
Tax-free personal income: $70,000
The savings scale linearly with the business size up to the 14-day limit. At the $5,000/day rate for 14 days ($70K), most business owners hit the practical ceiling
Compounding effect over time:
Using the Augusta Rule every year for 20 years on a mid-size business:
Annual tax savings: $16,800
Total over 20 years: $336,000
The Augusta Rule alone produces a third of a million dollars in extra wealth over a 20-year career for a single business owner
Other tax provisions stack with this:
Section 179: immediate expensing of equipment and vehicles purchased (up to $1.16M in 2024)
Bonus depreciation: 60-100% accelerated depreciation on assets
QBI deduction (Section 199A): 20% deduction on qualified business income
Section 121 home sale exclusion: $250K-$500K of profit on personal residence sale, tax-free
Health Savings Account: $4,150-$8,300 in pre-tax contributions, grows tax-free, withdrawn tax-free for medical
A business owner stacking all these provisions properly pays an effective tax rate of 12-18%. The same business owner without sophistication pays 28-35%
The difference is roughly $40K-$80K per year in saved tax. Over a 30-year career: $1.2M-$2.4M in extra net worth
The Augusta Rule is just one of about a dozen highly-leveraged tax provisions that ordinary tax filers never hear about because they're operating in W-2 reality. Every business owner with sophistication uses these provisions. Their accountants know about them. Their tax attorneys know about them. The IRS published them in the tax code
The middle-class American working a W-2 job has access to ZERO of these provisions. The W-2 employee can deduct standard items (mortgage interest, charitable giving, state and local taxes) but cannot:
Deduct vehicle expenses (no Section 179)
Deduct rental income from personal residence to employer (no 280A(g))
Get QBI deduction (W-2 income doesn't qualify)
Deduct home office (since 2017 W-2 employees lost this)
Strategic planning of capital gains (income is fixed by employer)
Almost everything that lets the wealthy reduce taxes requires you to be a business owner (or capital owner). The W-2 path categorically excludes you from the entire tax optimization layer
This is by design. The tax code rewards capital, business ownership, and asset accumulation. It punishes labor. The reward is approximately 20-30% lower effective tax rates for business owners using sophisticated strategies vs W-2 earners
The Augusta Rule is one of the simplest, lowest-effort tax savings available. Cost to implement: zero (if you already own a home and run a business). Time: maybe 4 hours per year for documentation. Annual savings: $7,000-$31,500
Most American business owners don't use the Augusta Rule. They don't know it exists. Their accountants might mention it once but never set up the structure. The provision sits in the tax code from 1976 waiting for someone to invoke it
You can be that someone. You need a home, a business, and 4 hours of paperwork per year
(if you want to fix your credit and qualify for the 0% APR business credit that helps you build the business that uses the Augusta Rule. link in bio)
been shipping with both. codex is a beast for backend scaffolding + self-testing. anything frontend or design-heavy, claude still wins. knowing which to reach for is the actual edge
https://t.co/Wc7ebmXk06
Codex is very good. I'm especially impressed by how it uses the browse to test its own work.
But any design related / frontend tasks, Claude still wins.
before building any feature i prompt claude code: 'ask me every question you need to fully plan this before writing a single line.' 5 min of Q&A up front saves hours of rework. better than plan mode
https://t.co/5VnpbWOM5G
This has sped up my AI coding 20x (prompt at the end):
Before building out a big feature, ask Codex/Claude Code to ask you as many questions it needs to fully plan out the idea
This is even better than plan mode. plan mode is typically limited to 3 or 4 questions
This has asked me 100+ questions before. Seems like a lot but actually saves you time in the long run
The plan it builds will be so detailed and complete that it can basically run autonomously and build the entire thing
But here's where you take things to the next level:
You also have it take your entire plan and create detailed Linear issues for it
It should create 20+ tasks in Linear
Then it's as easy as saying "ok work on the next thing" over and over until the feature is done
Highly recommend downloading and using Linear if you haven't yet. Amazing project management tool w/ excellent free tier
Will basically capture all these details and put your agent on autopilot. It's a 2nd brain.
Use this prompt:
"I want to build out *describe your feature in detail*. Ask as many questions you need of my to fully understand every detail of what I want to build out. Then take everything you learn, and create super focused and detailed Linear issues. Then begin work"
Getting so much more high quality code out with this workflow. You're welcome.