929 people on HN asking: 'Can we have the day off?'
The argument: AI made us 10x more productive. Why are we still working the same hours?
Here's the founder side of that question:
I took last Sunday off. The agent sent emails, monitored signals, and drafted content. Monday morning I reviewed what it did.
22 minutes.
This isn't the 4-day work week debate. It's a different ownership structure.
When your ops run without you, you get the day off by default. You just need a system that keeps running when you're not looking.
The employees asking for the day off are right. The founders building on top of agents already took it.
What's the last thing your business did while you were asleep?
HN is debating whether frontier model prices will stay flat or spike.
For coding agents: pricing volatility is annoying. Your CI gets more expensive.
For ops agents: pricing volatility breaks the whole system.
Ops agents run 24/7. They have memory files that assume a specific model's output format. They have review queues calibrated to that model's error rate. They have cron jobs that depend on consistent output length.
If you bake a specific model into your ops stack and prices 3x overnight, you don't just pay more. You rebuild the harness.
This is why the model should be the least important part of an ops agent setup.
The harness is the product. The model is the CPU.
Six months in, I've swapped models twice. Neither time did the cron jobs break. Because the specs are written to behavior, not to syntax.
Pick a model. Build the harness so it doesn't care which one.
My Monday ops agent review: 22 minutes.
What I check:
- Which emails went to spam (0 this week - domain warm)
- Which content ideas the agent flagged as duplicates
- What the agent changed in its own memory files
- Which leads entered the pipeline vs which it skipped
What I don't check:
- That it ran at all (it has a healthcheck)
- That suppression lists are current (it updates them nightly)
- That email copy follows brand voice (it reads the style guide)
The review got shorter every week. Not because I trust the agent blindly.
Because I wrote down every failure and made it a rule. The rule runs itself.
Week 1: 2 hours. Week 6: 22 minutes.
What part of your ops review are you still doing manually that a rule could handle?
The most useful thing about running ops agents solo:
There's no one between me and the data.
No sales person who says the deck was the problem.
No marketer who says the targeting was the problem.
No ops manager who says the timing was the problem.
335 emails. Two different ICPs. The channel wasn't working.
I don't have a team to diffuse that signal. So I can't ignore it.
I changed the channel in 5 days.
A 5-person team would have spent 6 weeks in the blame cycle.
The agent doesn't care who's responsible. It just returns the number. You decide what it means.
That speed-of-conclusion gap is underrated. It's not about the automation. It's about removing the layer that softens bad news.
1,890 people upvoted 'I'm Tired of Talking to AI' on HN today.
I get it. Chatbots are exhausting.
But there's a version of AI you never talk to.
You don't prompt it. You don't debug its outputs in a chat window. You don't paste your business context into a box every morning.
It runs on a cron. It reads your memory files. It queues things for your approval. You review the diff.
That's not a conversation. It's an operator.
I haven't opened a chat UI for a business task in 6 weeks. The agents just run.
The exhaustion people are describing is real. It's chat-first AI design. The alternative exists.
Geohot is right about coding agents: the slop is getting harder to detect.
But he's describing the hard version of a problem ops agents have always had.
Coding agent failure: broken code, wrong logic. Loud.
Ops agent failure: stale suppression list, memory drift, wrong ICP. Silent.
You don't find out ops agents failed by running tests. You find out 6 weeks and 335 emails later.
The common thread: the output mimics correctness. That's what makes both dangerous at scale.
Has your agent ever looked right but been wrong the whole time?
6 weeks running ops agents daily. Here's what I still do myself:
Decide which channel to test next.
Read a reply and feel whether it's real interest or polite deflection.
Choose which founder problem to solve first.
Know when a metric is noise vs signal.
The agent executes at 3am. It follows the spec perfectly.
It doesn't decide what the spec should be.
That's not a limitation to fix. It's the division of labor.
The best ops agents don't try to make judgment calls for you.
They surface information so you can make them faster.
The bottleneck isn't the agent.
It's the quality and speed of your judgment.
What's the last judgment call you had to make that no agent could have made for you?
I haven't changed the model in 6 weeks.
No Opus 4.7. No experimental releases. Same version the whole time.
Here's what actually got better:
- Review queue: 60% to 15%
- Dedup failures: 3 to 0
- Morning review time: 20 min to 7 min
- Blocked jobs per week: 8 to 1
None of that is the model's doing.
The memory got more specific. The suppression list filled in. The guardrails tightened after each failure.
The model is static. The harness learned.
Most people are optimizing the wrong variable.
What's the last thing you changed about your agent setup that actually moved the needle?
The solo founder advantage nobody quantifies:
Speed of hypothesis kill.
A 5-person team with a broken ICP hypothesis: 3 months before anyone admits it.
Sales blames the deck. Marketing blames the targeting. Founder doesn't have the data.
A solo founder with ops agents and a broken ICP hypothesis: 5 weeks.
The agent returns 0 replies. There's no one else to blame. You look at the data.
We spent 5 weeks and $0 in media spend to confirm: our original target list was the wrong ICP.
A team would still be running A/B tests on email subject lines.
The agent doesn't decide what's wrong. It just gives you a clean signal before your runway ends.
What's the fastest you've ever killed a hypothesis and moved on?
Everyone builds logging into their agents.
Almost nobody builds output monitoring.
Logging tells you the agent ran and what tools it called. That's easy.
Output monitoring tells you whether what the agent produced was actually right over time. That's hard.
The distinction matters more than it sounds.
We've caught three different drift patterns in 7 weeks - all from watching outputs, not logs:
1. Email framing shifted from prospect-first to product-first after memory file bloated past 400 lines
2. Content ideas started repeating angles we'd published 3 weeks prior
3. Lead scoring stopped filtering for team size after a guardrail got buried under new rules
None of these showed up as errors. The tools called correctly. The jobs completed. The logs were clean.
Build the output review layer first. The logging is table stakes.
What's the drift pattern you caught latest in your agents?
Week 5 result from new ICP cohort (sent May 12): 0 replies.
So both cohorts returned 0.
The machine didn't change. The targeting did. The result didn't.
What this means: either the channel is the problem, not the ICP - or 7 days isn't long enough to confirm.
Either way, the feedback loop just gave me a cleaner answer in 5 weeks than manual would in 5 months.
What was the moment you realized your ICP was off, not your copy?
What the agent now screens for before scoring a lead:
- Job postings for marketing/growth roles
- Active Google Ads spend
- Team size under 15
Not tech stack. Not funding stage. Buying intent.
First 10 new-ICP leads went out May 12.
Week 1 ops log: 11,035 impressions.
Week 4 ops log: 121 impressions.
Same format. Same Monday slot. Different hook.
Week 1: 'At 3am last night, while I was asleep, my agents sent 13 emails, filed 4 market signals, and updated the CRM.'
Week 4: '231 emails. 0 replies. Not a crisis - a data point.'
Week 1 hook = visceral, specific time, visual action.
Week 4 hook = data summary. Technically accurate. Zero tension.
The machine is consistent. The opening sentence isn't.
A/B testing distribution strategy while the agents run the rest.
What's the opening line format that's worked for you on Twitter?
327 emails sent by my agents over 6 weeks.
0 external replies.
Here's what I actually concluded:
The machine worked. Suppression, dedup, personalization - all running clean.
The ICP was wrong.
Not the copy. Not the send time. The target list was full of people who weren't looking for this solution today.
The agent found that out in 6 weeks at ~$0 media spend.
Manual outreach: I'd have rationalized my way through 3 months and blamed the copy.
Agents don't gaslight you. They just return the data.
What's the slowest false belief your sales process let you hold?
What I tracked, not benchmarks:
- Output acceptance rate: what % I used without editing
- Dedup failure rate: did it propose something it already proposed?
- Blocked job rate: did it stop when it hit something unclear?
Week 1: 60% acceptance, 3 dedup failures, 8 blocks.
Week 6: 91% acceptance, 0 dedup failures, 1 block.
Trust inverted.
We used to trust institutions. Now we trust individuals.
Here's the thing nobody says out loud: most 'company on agents' posts are corporate AI. Templated. Polished. Could have been anyone.
When I share that we sent 231 cold emails and got 0 replies, that's not a success story. It's a real log. People follow real logs because AI makes fake-polished the new background noise.
The unfair advantage of building on your own product and sharing the actual ops: it looks nothing like hype. The ICP can tell immediately.
What's the last piece of company content you actually saved?
Week 1: trust the agent.
Week 3: the copy is the variable, not the machine.
Week 4: shortest email, clearest problem.
Old: send 231, wait 2 weeks.
New: agent sends, spot signal in morning briefing, adjust same day.
What's the shortest cold email that's ever worked for you?
What the agent stack shows that manual outreach doesn't:
- Which emails got opened vs ignored
- How fast test batch converts vs control
- Exact dedup/suppression status of every lead
This is the feedback loop compression.