Here's an actual snippet of a blog post that 4.8 generated for me. On the plus side, I've gone back to writing myself :-)
"Every shipped algorithm was read by three more agents working different angles. An auditor checked that the code does what its description claims. A second hunted for the algorithm's weakest assumption. A third wrote the one-liner you have been reading throughout this post. The auditors flagged nine algorithms for honesty. I read all nine. Every one was a reviewer holding an "extract" algorithm, which is permitted to read the date with a getter, to the stricter standard meant for the from-scratch cohort. Nothing was actually misrepresenting itself, which is the result I wanted and did not assume. The weakest-assumption reviewers were more sobering."
Word salad at best!
@davidad This is so on point. 4.8 is SO BAD at writing. Not in the emdash kinda way, but a legit "I'm sorry, I have no idea what you're trying to say right now" kinda way
@trq212@sidbid I was struggling to internalize dynamic workflows and what "write its own harness on the fly" actually meant. So I used it to fan out 484 agents rebuild https://t.co/pQJ7GH2lSN (turns out no, not today).
Wrote up my learnings here: https://t.co/gPv3y6qs0k
Today, I signed an Executive Order temporarily repealing bedtimes in the City of New York so that kids of all ages can watch our team in the NBA Finals.
As Mayor, you’re forced to make many difficult decisions. This was not one of them.
Go Knicks.
I rebuilt "Is it Christmas" using 484 subagents and 16 million tokens to learn how Claude's dynamic workflows work. Spoiler Alert: today is not Christmas.
https://t.co/PCK11TjATV
h/t @konklone, as always
I'm a couple projects into Codex + 5.5. Early reactions:
* The coding model is very good. It YOLO sideloaded and debugged a new Android widget with no human intervention very very well.
* I hate using the mouse.
* Way too many back-and-forth questions. I kept finding Codex waiting for me. Like I say "LGTM let's build!" and come back 15 minutes later expecting a finished result but instead found a "Should I get started?"
And how would you trade off? If CC + Opus consistently generated better code output but you really dislike their interface (or vice versa), which would you use for day-to-day work?
@andrewneilson_@dlwiest This is the actual answer. Ignore other replies. Enterprise is PAYG.
The other (way too common) answer is people just use PAYG because they don't read/think e.g. paste your API key into Cursor and oops $3000
@moonpetal76 My parents liked Dustin Hoffman in The Graduate and my great-great-grandparents thought Eisenstein sounded to Jewish when they got off the boat.
@nateberkopec My system prompt includes: "Bluntly correct me when I'm wrong. I'd rather argue than have you cave, especially when I'm being an idiot." which results in a lot more "Let me push back..." which is often great.