🇦🇷🗣️ Messi in the documentary about the reason for his smile when France equalised:
“I felt like football was telling me, this cup was not created for me to be its champion. I felt that the 2014 scenario would be repeated and I would be sad, but something inside me told me, ‘Wake up, you fool. This is no longer your dream. It has become the dream of the world.’ And indeed, I stood on my feet again and said, ‘Nothing will stop me.’ Realizing my dream this time”
My spiciest take: 🌶️
You only need prod.
Other environments are optional, and may cause more problems than they solve.
Imagine prod was our team’s only environment.
We’d:
- write lots of tests
- stop batching work in lower environments
- stop rushing to hit arbitrary release cutoff dates
- stop spending hours every week maintaining and coordinating work in lower environments
- auto-deploy upon merge
- release small changes multiple times a day
- monitor prod via automated checks that notify us of issues
- use feature flags and phased releases to safely test in prod before making a feature visible to everyone
These are mature dev team practices. Having only prod *forces* them.
So, your team might be better off with only prod than with a bunch of non-prod environments.
I start with very informal specifications written by hand. I have an agent convert these into harder specifications that are subdivided into tasks. I review these.
Then I feed those tasks into the specifier agent, which converts each task to Gherkin, prunes the Gherkin, and then hands it off to the coder agent. I spot check the Gherkin.
The coder agent writes acceptance tests directly from the Gherkin. Then writes unit tests. Then writes code. When all those tests pass, the coder agents hands off to the refactorer agent.
The refactorer agent reduces crap to 6 or below, and reduces any duplication. Then it write property tests and gets them to pass. Then it hands off to the architect agent.
The architect agent runs language mutation and covers any uncovered sections, and kills all survivors. Then it runs Gherkin mutation and kills any of those survivors. Then it runs the entire test suite, and when it passes it hands the result off to the specifier, coder, and refactorer.
I spot check the code.
This is an exercise of transformations from the informal to the formal through managed stages, with human interaction decreasing with each stage.
Raw computer power is the limiting factor. Those mutation tests are CPU intensive.
4 years ago Argentina were playing like this in the 104th minute of a World Cup final when the score was tied, now tell me that cowardly football isn’t killing this sport
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem.
As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)!
I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work.
It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results?
88ms => 1.5ms
150K allocs => ~500 allocs
Incredible right? Nope.
My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path.
This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput.
The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity.
Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
Age is not an excuse to not try.
At the age of 32, Julius Caesar broke down in tears before a statue of Alexander the Great, realizing he had accomplished almost nothing in his life while Alexander had already conquered much of the known world.
Serving as a minor official in Spain and burdened by heavy debts, Caesar felt his existence was insignificant by comparison.
This moment of painful self-reflection became a turning point that sparked a fierce new determination.
He returned to Rome, rose rapidly through politics, conquered Gaul, invaded Britain, won a civil war, and fundamentally transformed Rome into a vast empire.
Don't ask people their opinion, watch what they do with their wallet (#SkinInTheGame).
In polls the French claim to prefer to live away from large expensive cities in favor of small towns & bucolic villages. But when they vote with their wallet they do the exact opposite.
Best tools for AImaxing
Harness:
- Codex best Desktop App
- Droid best CLI
- Pi best building block
- Opencode best TUI
Models:
- GPT-5.5 best model
- GLM-5.1 & Kimi best reverse engineers
- Deepseek Pro/Flash best cost to intelligence
- Opus-4.7 best for UI / Charts / LLMOps
- Qwen3.6-27B / 35B best local agents
- Gemma-4-31B best local intelligence
Mobile control:
- termius
- codex & ChatGPT
- kittylitter
Service and networking
- tailscale
- cliproxyapi
Tracking usage:
- automation in codex
- codexbar
Plugins, CLIs and MCP:
- computer-use (codex)
- chrome (codex)
- agent-browser (droid)
- Figma MCP (all)
- GitHub CLI (all)
- GMAIL/CAL plugins (codex)
- grill me skill
ADE:
- Warp
- Zed
Current meta:
- vLLM-studio for local agents
- Codex app for /goal and non-coding work
- Droid for coding
- Zed/Warp if I need to read the code
I get how uncomfortable it feels to disengage from the syntax, from the sequence, selection, and iteration of code, from the dopamine hit of getting a complicated function to execute properly. I get it. I've been coding for longer than most of you have been alive -- I get it.
But the bar has been raised. And if I, someone who has been coding for more than six decades, can clear that bar, you should be able to clear it too.
And fear not, I've found plenty of joy on the topside of that bar. It just take a leap...
Shopify CEO Tobi Lutke explains Goodhart’s law and why he doesn’t like KPIs or OKRs
“Goodhart’s law is real. The moment a metric becomes a goal, it’s no longer a useful metric… No metric by itself is a complete heuristic for a complex business. There’s a million different tensions in a company, and you can’t keep all of them in harmony by optimizing for one thing.”
For this reason, Shopify doesn’t use KPIs or OKRs. But as Tobi explains, this doesn’t mean they don’t value data and metrics.
“We are extremely data informed. We have invested enormous amounts of money and time into systems that give us basically everything at our fingertips… But what Shopify attempts to do is just not over-fit for what’s quantifiable.”
People love optimizing for highly-quantifiable things because there’s immediate gratification that comes from seeing a number go up. But Tobi thinks that the most important aspects of a product are rarely quantifiable:
“The overlap of the most valuable things you can do with a product and the things that happen to be fully quantifiable are like maybe 20%. Which leaves 80% of a value space unaddressable by the people who only look at quantifiable things.”
He continues:
“Shopify is comfortable with unquantifiable things like taste, quality, passion, love, hate… The sort of deep satisfaction that a craftsperson feels when they’ve done a job well is actually a better proxy if you allow it to be.”
They then have robust analytics systems that tell the company if something’s wrong or a new rollout breaks something.
“We think about it as a cockpit for a pilot. The decisions are still made by pilots, and we think this leads to better results… I think there needs to be more acceptance in business of unquantifiable things… And then metrics take a support function.”
Source: @lennysan (Feb 2025)
I've put my specifier->coder->refactorer->architect pipeline into a loop by having the architect tell the specifier to "improve something". It's already made a number of very useful improvements.