I build companies and systems. Currently shipping Govern 365 and running Texas Star Party. Previously founded an esports company that helped seed Cloud9.
@DavidSacks so what’s the plan for defenders and those creating sensitive production systems? I write an essay on this yesterday, and hope the admin understand the asymmetry https://t.co/obJHZskwfZ
@bcherny I basically have two choices at this point, either I kneecap via hooks so I can think with it and my code and send the work to codex or let it loose and get real work done but run out of usage in 2-3 days. It’s basically a harness builder now… you must know once we figure that out we won’t be on Anthropic at all, right? This weekend I’ve been in one long thread on codex in a messy project just throwing stuff at it and it nails it every time, barely any usage… I love Anthropic but something clearly went very wrong.
@ClaudeDevs Same, I was due to reset an hour from when they did this. Very frustrating. I manage usage carefully. This affected my work. I even spent an hour talking to Fin, who basically told me go to hell. Disappointing.
Anyone ever seen this before? @AnthropicAI@claudeai@trq212
Hadn't used my account in 8-10 hours, and I essentially only use it as my thinking partner. I run all my long running stuff on my DGX Sparks. Nothing ran overnight. Really makes no sense - what does this even mean?
We’re Building Faster. Why Aren't We Shipping Faster?
AI didn’t remove the bottleneck.
It moved it.
The inner loop is obviously faster now. We can spec faster, prototype faster, generate more code faster, and get feedback from coding agents almost instantly.
But then you hit the rest of the system.
Review. Release. Deployment. Signoff. Verification. Environment drift. Process debt. Human attention.
That’s the part I think a lot of teams are feeling right now.
We sped up the visible work first. The code. The drafts. The experiments.
What AI exposed is everything that was already slow behind it.
And there’s a second shift that matters just as much:
The constraint is no longer just output. It’s cognitive load.
Fast feedback is great until it’s faster than your ability to integrate it.
At some point, the real work stops being “can I produce enough?” and becomes:
can I stay oriented?
can I keep a stable mental model?
can I tell signal from noise?
can I make good decisions at the speed the tools want to operate?
That’s why I don’t think the deepest shift in the agent era is automation.
It’s that judgment moved closer to the center.
The teams that win here won’t be the ones generating the most activity.
They’ll be the ones that: keep the system visible, reduce cognitive drag
move judgment up the stack, fix the outer loop, not just accelerate the inner one.
We are building faster.
But right now, a lot of us are still in the phase where the machine is amplifying bottlenecks faster than we can remove them.
That’s not failure.
That’s the map.
I raced two DGX Sparks against each other using Karpathy's autoresearch. 74 experiments, neither agent knew the other existed. Both independently converged on the same strategy: shrink the model, get more steps in.
Baseline: 43.9GB. Winner: 2.1GB. 98% less memory.
Ran @karpathy's autoresearch overnight on two NVIDIA GB10s (DGX Spark), racing against each other.
38 experiments on spark1, 23 on spark2.
Results:
Baseline: 1.8198 val_bpb
Best: 1.2264 val_bpb (-32.6%)
Winning recipe (spark1):
- Depth 4, ASPECT_RATIO 96, MLP 2x
- Batch 2^15, tuned LRs
- 1252 steps, 41M tokens, 2.1 GB VRAM
The GB10 is a fun little research GPU. ~160ms/step, 0.8% MFU — not an H100, but 61 autonomous experiments while I slept is pretty cool.
Spark1 beat Spark2 by 0.017 bpb. Claude agents all the way down.
Unless these systems are building for and selling to other agents, the intent and oversight of the operator remain crucial.
Like everything before AI: it isn't the tool, but the hands.
Matt Shumer's "Something Big Is Happening" is making the rounds. He's right that the pace of change for engineers is dizzying. But the "just prompt it and it does the work" framing is misleading — and ironically proves the opposite point.
And if agents can move fast and independently, the real value becomes how effectively the operator can manage them. We're nowhere near "assign a goal and let it run for months." Taste, judgment, and understanding human needs — that's a steep climb.
If your regulatory files are scattered, outdated, or hard to track, you're risking more than just delays. Read the full blog🔗https://t.co/rghpDmF6rk
Enter Govern 365’s Virtual Data Room — your secure command center for FDA & regulatory compliance.
#LifeSciences#FDA