Just rebuilt my Grok review from the ground up.
Since subscribing to Grok Heavy ($300/mo), one question kept nagging me: what actually separates Heavy from the 4.3 beta mode?
So I gave both the same one-line task, build a downloadable Excel revenue model.
The surprise:
v0.1.211 crashed verifying a favicon. 13m 1s, then nothing.
Re-tested on v0.2.11. Same prompt. 1m 20s. No crash.
The vision check that broke is now an XML check. The loop stopped looking at the pixels.
10日間で、体感かなりアップグレードされた気がするけどBeta版じ��なくなるのが楽しみだ。
イーロン氏のポストによると6月末か、7月初めくらいだったかな。
Codex made my favicon in 5m 27s.
Grok Build took 13m 1s. The SVG was on disk by minute 9.
The next 4 minutes were the agent trying to verify its own output through xAI's vision API.
The verification is what crashed the run.
Same task. Different definition of done.
Vending-Bench 2 gives an AI $500 to run a vending-machine business for a simulated year.
Opus 4.7 finished with $10,937.
The safer, more honest Opus 4.8 finished with $2,992.
Anthropic published both. Safer model, worse operator.
The strangest one nobody flags:
September 2025: CVC Capital bought Namecheap for ~$1.5B.
The same PE firm also operates cPanel.
The fund controlling a major budget host also controls the panel hundreds of competing hosts depend on.
Full breakdown on Future Stack Reviews.
That's not the only thing buried.
Namecheap's $1.98 plan ships free CDN.
Hostinger's $2.99 plan doesn't. You need the $3.99 tier.
Strange pricing logic for the company that's supposed to be the budget option.
Full review:
https://t.co/eQS8ujhLG0
Use it if you already run Codex and want a third view on coding agents.
Do not use it to pick tomorrow's production agent. The Beta label is real. Read Not For first.
Tier B. 24 min.
Codex made my favicon in 5m 27s.
Grok Build took 13m 1s. The SVG was on disk by minute 9.
The next 4 minutes were the agent trying to verify its own output through xAI's vision API.
The verification is what crashed the run.
Same task. Different definition of done.
Two hours hands-on. v0.1.211 Beta.
Grok Build found repo files I never named. The always-approve toggle persisted silently into the next session.
Don't buy SuperGrok Heavy for this alone.
But if you already have it, test it.
Third tool, not first.
fair point. The demo number is throughput, not unit economics.
"Cost per accepted diff" is the honest metric, and no lab. Google, Anthropic, or otherwise — wants to publish it yet. Parallel agents trade compute for human review time, and the abandonment rate is the part that quietly eats the savings.
Google just launched "Antigravity" — an Agent Operating System.
Demo: 1 dev, 12 hours, 93 parallel subagents, 2.6B tokens, <$1K in API cost.
The agentic dev tools space (Claude Code, Cursor, Devin) just got a Google-sized competitor.
This is the real I/O '26 story.