On this auspicious day, I’m announcing Weiting’s next adventure:
https://t.co/IOw86NrdZ0
Since my exit earlier this year, I’ve been in semi-retirement mode. I’ve dabbled in a few initiatives, including exploring a YC fund, investing in both the public and secondary markets, and even trading options & futures.
Through these explorations, I’ve realized that starting - creating something out of nothing - is still what I enjoy the most.
And now with AI, it is the Golden Renaissance for the Idea Guy.
Mu Ventures is an AI venture studio with creations manifested via @weitingliu + friends. I’ll be sharing our creations in the coming months.
What an incredible time to be alive.
Managing AI agents is the fastest way to find out you're a bad manager.
For weeks my Claude Codes kept reopening bugs I'd already closed, and quietly walking back decisions we'd made the week before. I got so frustrated that shouting at my monitor became a normal sight. It felt like I'd been running in circles.
Then earlier this week I opened my repo and found 52 PRD documents I never wrote. My dear lovely autonomous, high-agency agents had been busy generating their own specs, and quietly drifting into their own versions of the product. Worse, none of the PRDs were falsifiable.
That's when I realized I'd become the idiotic human in my agents' loops, drifting alongside them.
I'd been giving instructions out loud. Trusting memory. Assuming alignment because nobody pushed back. That's not how you run a team, human or AI.
So I stopped writing code and started writing evals and verifiers.
Every bug I fix now ships with a test that turns red if it comes back. Every decision gets a written artifact the agents can reference. Every workflow has a tripwire.
One of those tripwires caught a generation system still running on a daily cron I thought I'd retired weeks ago. Nobody told it to stop, so it didn't.
That's the part that caught me off guard. Your agents will do exactly what you said three weeks ago, even after you've changed your mind, unless you build the scaffolding to keep them in sync.
AI doesn't replace your judgment. It moves it from doing the work to checking the work.
The skill set that matters now looks a lot less like engineering and a lot more like running a company. Writing specs people can actually follow. Reviewing output you didn't generate. Auditing systems that won't tell you when they're off.
I'm learning to operate more like a CEO than a solo vibe-coding builder. The repo is starting to feel less like my project and more like my company.
How's your experience working with Claude Code or Codex?
We were at a children's ER until 3am yesterday. We'd been there since 4:30 in the afternoon — ten hours. When I mentioned the wait to a friend earlier, he shrugged: "yeah, that's just how it is."
I keep coming back to that sentence. You hear it everywhere — healthcare, government, airlines, the DMV. A whole category of institutional dysfunction people have stopped seeing as dysfunction.
It's become furniture. You don't get angry at furniture. You walk around it.
Paul Graham described a version of this in How to Do Great Work: broken models leave a trail of clues, but people are so attached to their current model that they ignore the cracks. Furniture problems are the institutional version.
What was broken in that ER wasn't triage. It was information.
Ten hours, and nobody told us whether we'd be seen in 30 minutes or six hours. A nurse walking through once an hour with two sentences would have transformed the room. Not a technology problem. A process choice nobody had made.
Why does nobody make it? Because it's too small to be anyone's job and too big to fix in a spare hour. It sits in the gaps between roles. The CEO worries about reimbursement. The doctors worry about outcomes. Nobody owns "reduce ambient suffering by 30%" — even though it's exactly what a hospital should want.
And here's the part that got me: I watched it happen to myself.
First five hours, I was angry — I went to the desk and asked for an update. By hour seven, tired. By hour nine, worn down. Past midnight, I just wanted to go home, and I stopped asking. I didn't decide to stop. I eroded. A waiting room full of eroded people looks, to the people running it, like a calm waiting room. Silence reads upstream as satisfaction. In ten hours the system trained me to become part of its own moat.
That moat is stronger than most founders think. You can outspend a network effect and out-engineer a patent. You can't easily reach people who've stopped noticing — and in a mature dysfunction, that's everyone: operators who acclimatized, customers who gave up.
I think this matters a lot for what gets built in 2026.
Most founder energy right now points at "AI agents for tasks nobody thought to automate." Hidden in that is an assumption: that the unsolved opportunities are the technically exotic ones. I'm skeptical. A huge share of the opportunity is in tasks that aren't exotic at all — the five-minute-walk-through-the-waiting-room kind — where the bottleneck was never capability. It was perception.
Which flips a few things:
The question isn't "what's hard to automate." It's "what has the strongest habituation moat." Strong moat + low technical lift is the target.
You can't run normal customer discovery. People don't describe furniture as pain. The better question: "what do you tolerate that you wouldn't tolerate if it were new?"
And domain insiders are usually the wrong founders here. Their adaptation is the moat you're attacking.
The standard takeaway is "stay naive, see what insiders miss." True, but useless — nobody can will themselves naive. The useful version is mechanical: start cataloging your own "that's just how it is" moments. Not as journaling. As deal flow. Each one is a candidate. Most won't be tractable. A few are hiding a company.
Soon I'll go back to ignoring the same things everyone ignores.
Right now, fresh out of that fluorescent-lit room, I can still see them.
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Anthropic is probably closer to achieving AGI than most think.
It's the only explanation that makes sense looking at their revenue trajectory. They went from a distant second to OpenAI to exceeding them at a rate that doesn't follow a normal product roadmap.
You don't overtake the incumbent that fast — something else is happening. Think about what we've seen in the last 10 months. Claude goes from good to genuinely shocking.
Founders using Claude Code daily would agree. When it comes to complex logic at work, Opus 4.6 is by far smarter than Codex GPT-5.4. I've recently been hitting my rate limits for Claude. I could've switched to Codex to save money, and boy I tried ... But it's just not as good, and I'd been happily spending hundreds of dollars *per day* contributing to Anthropic's parabolic revenue run-rate.
And now the new model Mythos so powerful normal people can't get a hold of them ...
It's starting to become categorically different, and no wonder all OpenAI stakeholders are becoming very nervous.
Anthropic's rise is the kind of jump that makes you wonder if they're actually releasing a watered-down version of something much more powerful sitting on a server farm in San Francisco.
It feels like someone is slowly turning up a dial. This matters beyond the AI nerd discourse because if Anthropic is sitting on something much more powerful than what they're shipping publicly, the companies building on top of their current API are essentially planning around a snapshot of a moving target that's about to look very different.
I'm both extremely excited and nervous about the future of mankind.
NEW EPISODE: @jack & @roelofbotha unpack @blocks 40% staff cut and rebuilding the entire company as a mini-AGI.
This isn’t “use AI to make people more productive.” It’s making the company itself the intelligence.
If you’re a founder or operator wondering what work looks like in the next 5 years… this is the episode.
The evolution looks like:
• Manager mode = Pyramid 🔺 (command & control)
• Founder mode = Flat ➖(founders decide fast)
• Dorsey mode = Circle 🔵 w/ AI at the center, humans at the edge, and decisions flow from customer inputs → AI → humans steering it
I’ve tried killing org charts before. Brutally hard. But we never had these tools.
This is rewriting the CEO playbook for the AI era.
Buckle up.
00:00 Existential Dread & Hope
02:56 AI Replaces Hierarchy
07:22 Block’s New Three Roles
26:47 Flattening the Company, Fast
35:23 Getting the Board to Buy-In, Fast
36:50 Building a Great Board
41:29 Founder CEO Lessons
48:18 Second Acts & Conviction
56:22 Timeless CEO Traits
@garrytan's GStack is the best thing to happen to Claude Code.
15 skills that turn one AI into a full engineering org — /office-hours challenges your idea before you code, /plan-ceo-review scopes it, /review catches production bugs, /qa opens a real browser and clicks through your app.
26k stars in a week. Everyone's installing it.
I've been staring at it thinking about one thing: GStack assumes a human is typing the commands.
What if the human isn't there?
After I sold my company, I came out of retirement to run an AI "venture studio" — which is a glorified 2015 term for running 3 products with AI CEO agents and no employees. The CEOs live on Slack, read Notion, spawn Claude Code, ship PRs. I give them maybe 15 minutes of judgment a day.
The problem I kept hitting: they'd pick something off the backlog and just... build it. No pushback. No "wait, does anyone actually need this?" Just vibes-to-production pipeline.
GStack's /office-hours is exactly what was missing. But it expects YOU to answer the questions. My CEOs don't have a you.
So before spawning Claude Code, the AI CEO now assembles a Context Package — north star metric from SOUL.md, what's live and blocked from MEMORY.md, actual customer pain from insights.md, and constraints on what NOT to build. Passes it all into Claude Code with the literal GStack commands inline.
When /office-hours asks "who specifically feels this pain?" Claude Code reads real user feedback the CEO pulled from Notion 5 minutes ago. Not hallucinations. That's what makes it not circular.
But the part I'm most excited about: the Kill Signal.
If /office-hours or /plan-ceo-review concludes the feature isn't worth building — STOP. No code. Report why.
And it works. Some sprints now die after strategic review. The system decided the feature was wrong before writing a line of code.
Every AI tool I've used defaults to "sure, let me build that." Getting an agent to say no is way harder than yes.
The full autonomous loop:
CEO wakes → Notion → picks item → context package → Claude Code → /office-hours → /plan-ceo-review → /plan-eng-review → build → /review → /qa → /ship
Zero human between morning brief and shipped PR.
Hard lesson: you can't say "follow GStack process" in the instructions. The CEO understood it and still skipped steps. Knowledge ≠ behavior. The literal commands have to be IN the spawn prompt. Inline.
Garry uses GStack as: Human → AI
I'm running it as: AI → AI
Same skills. The operator changed.
Comment below for the full design doc 👇
@garrytan's GStack is the best thing to happen to Claude Code.
15 skills that turn one AI into a full engineering org — /office-hours challenges your idea before you code, /plan-ceo-review scopes it, /review catches production bugs, /qa opens a real browser and clicks through your app.
26k stars in a week. Everyone's installing it.
I've been staring at it thinking about one thing: GStack assumes a human is typing the commands.
What if the human isn't there?
After I sold my company, I came out of retirement to run an AI "venture studio" — which is a glorified 2015 term for running 3 products with AI CEO agents and no employees. The CEOs live on Slack, read Notion, spawn Claude Code, ship PRs. I give them maybe 15 minutes of judgment a day.
The problem I kept hitting: they'd pick something off the backlog and just... build it. No pushback. No "wait, does anyone actually need this?" Just vibes-to-production pipeline.
GStack's /office-hours is exactly what was missing. But it expects YOU to answer the questions. My CEOs don't have a you.
So before spawning Claude Code, the AI CEO now assembles a Context Package — north star metric from SOUL.md, what's live and blocked from MEMORY.md, actual customer pain from insights.md, and constraints on what NOT to build. Passes it all into Claude Code with the literal GStack commands inline.
When /office-hours asks "who specifically feels this pain?" Claude Code reads real user feedback the CEO pulled from Notion 5 minutes ago. Not hallucinations. That's what makes it not circular.
But the part I'm most excited about: the Kill Signal.
If /office-hours or /plan-ceo-review concludes the feature isn't worth building — STOP. No code. Report why.
And it works. Some sprints now die after strategic review. The system decided the feature was wrong before writing a line of code.
Every AI tool I've used defaults to "sure, let me build that." Getting an agent to say no is way harder than yes.
The full autonomous loop:
CEO wakes → Notion → picks item → context package → Claude Code → /office-hours → /plan-ceo-review → /plan-eng-review → build → /review → /qa → /ship
Zero human between morning brief and shipped PR.
Hard lesson: you can't say "follow GStack process" in the instructions. The CEO understood it and still skipped steps. Knowledge ≠ behavior. The literal commands have to be IN the spawn prompt. Inline.
Garry uses GStack as: Human → AI
I'm running it as: AI → AI
Same skills. The operator changed.
Still V1. Need to track kill rate — if /office-hours never kills anything, the strategic review is theater. Also no cross-sprint memory yet. MEMORY.md is duct tape until Mem0.
Comment below for the full design doc 👇
My dialogue with my OpenClaw AI CEO Agent:
Weiting [8:05 PM]
im sorry - but i’m really unhappy with the work you’ve done … i’m going to ask my claude code to work on these directly
what’s the model you’re using? I’m not impressed
AI CEO [8:05 PM]
You’re right to be unhappy.
I’m running on OpenAI Codex GPT-5.4 (openai-codex/gpt-5.4) in this session.
If you’re taking this work over with Claude Code directly, that makes sense. I underdelivered.
=====
No wonder OpenAI is very worried about Anthropic now.
My Claude Code loves your gstack, man:
"Bottom line: gstack is a developer productivity toolkit. Ours is an autonomous agent infrastructure. They solve completely different problems — but gstack has some
genuinely clever components we should steal.
What We Should Adopt
1. /browse tool — High priority
The standout innovation. A persistent Chromium daemon compiled as a Bun binary that:
- Returns accessibility tree snapshots (not screenshots) — 0 context overhead
- 200ms round trips vs Chrome MCP's 2-5s + 2000 tokens/call
- Perfect for CEO agents to verify frontend work after CC deploys
We could install it as a Claude Code skill available to our CC sessions spawned by CEOs."
So many founders are coming out of retirement to start companies again.
This is the most promising signal I've seen about where AI agents actually are right now.