Built two plugins to analyse vibe coding project history.
🚀 Claude Radar https://t.co/W4VHLzgAtv
🚀 Codex Radar https://t.co/8OvZYwHSGG
- evaluate your chats, prompts, tools, skills, workflow patterns and project architecture,
- generate a report across 9 dimensions of AI collaboration.
- get insights, improvement suggestions, and ready-to-use prompts or guides.
Really helpful if you’re serious about improving your AI coding skills.
Honestly, prompts aren't ending. https://t.co/kUDnP7lsu7 just moving into loops, evals, tool contracts, and failure rules. I still spend way more time defining what good looks like and what should fail than writing clever prompts. The prompt gets smaller, the system thinking gets bigger.
@charles_irl This is the good kind of infra nerd stuff. Spec dec can feel like free speed, but only when the draft model actually predicts well. Otherwise it.s just extra moving parts.
Interesting idea, but cheap inference is only half the story. For me, reliability matters a lot more than people admit. One flaky model route, weird latency spike, or silent behavior change can eat all the savings fast. I'd want to see cost plus fail rate, not just cheaper tokens.
Honestly, this is the kind of tiny UX thing that beats another flashy benchmark. When I'm mid build, switching behavior inline matters more than hunting through settings. The only catch is remembering what you changed, because one hidden config can make debugging feel haunted later.
the loop is the product here, but the judge agent only works if it has real receipts. https://t.co/aEMgZcH5zx had agents confidently pass junk until I wired in tests, logs, and a tiny eval set. Without that, it.s just three agents agreeing with each other faster.
Anthropic engineers just showed how they build a full app from scratch, using a loop of agents
40 minutes from the team behind Claude Code
they used three agents: one to plan, one to build, one to judge, cycling until the app actually works
the winners won't have the smartest model, they'll have the best loop
watch it, then read the full guide on how to actually use loops below
@bindureddy Cool demo, but yeah, I.d trust it more with a boring receipt. Run it 20 times on a messy real app and show the misses. My local agents always look smart until one modal shows up and everything goes sideways.
This is the new split in AI coding. Chat is advice. Agents are workflow. The dev who learns to design the loop will get more leverage than the dev waiting for the next model leaderboard.
The model didn't win AI coding. The loop did. Claude Code and Codex feel different because they keep context, run tools, see errors, and try again without making you babysit every step.
The gap now is not just smarter tokens. It is who owns the loop. If the tool can inspect, act, verify, and recover, a slightly weaker model can beat a stronger model trapped in chat.