@AsphaltCowb0y@bcherny@binaryminary They previously screwed up the frontmatter name - context: fork actually means a NEW context.
What he means by fork: true is that it would inherit the current context (i.e. skill execution would see all history, but caller of skill only sees the result at the end)
@joegibbs98@tszzl Yes. Can just keep breaking down any skill into aspects to verify. Can also recursively break down a skill into sub-skills.
I expect the most difficult domains will be things like judging subjective things against current norms (e.g. making a funny joke about current affairs)
@joegibbs98@tszzl Wherever there's a generator / verifier gap, then an ansemble of focused verifiers can grade to a rubric -> improvement feedback.
That could be applied to nearly any domain.
@thsottiaux Only some tasks can be specified enough, even then, it's hard to spec every obvious thing.
E.g. /goal Cypress -> Playwright migration was discussed & done, but tests put in wrong places as autistic GPT fixated on `legacy` word and put all tests in a single file named that.
@embirico Most usage to Codex CLI (as no Linux desktop app).
Due to GPT's more thorough & bug free code vs Opus.
Not perfect though. GPT overly fixates on the wrong things. Codex CLI missing a lot of QoL features, but at least not the buggy CC mess.
Combo of GPT + Opus is best.
@antirez@AMD GPU stability was painful for the first few months. It's good now, but you definitely want the newest distro. Fedora 44 has been rock solid.
Ubuntu LTS is too old.
@tharshan_09@badlogicgames Just ask Claude to explain how the Workflow tool works.
Essentially a workflow is a JavaScript file with a few function primitives for workflow phase, agent, pipeline, parallel. The tool docs tell Claude how to program it and then run it.
@theo@BrunoBertapeli Their benchmark placing 4.7 > 5.5 clearly isn't aligned with actual real world engineering where GPT is clearly ahead.
So 4.8 < 4.7 doesn't have much significance IMO.
@antirez Jaggeredness + regressions.
GPT-5.2 had better reasoning, attention to detail, debugging. Worse in big picture, EQ, design, TPS, etc.
5.3-codex pulled ahead in core coding, but still generalist gaps where Opus better.
5.4 clear lead w/ CC & Opus regressions.
5.5 mogging.
@badlogicgames OpenAI are still deep asleep on B2B - largest team plan is still $25/month (Claude has $140 premium seats).
Extra already paid usage credits sitting in org account are impossible to use by members who run out of quota.
Tagging/messaging OpenAI peep goes nowhere.
@ClaudeDevs /btw can't easily escape back to prompt afterwards (no flicker mode).
Feels locked up after giving the response, though sometimes with enough mashing of keys, can manage to get back.
@badlogicgames Neither based on stated reason.
Would have fn scan first with early return of orig array on happy path. It calls seperate mutator fn if payload size exceeded.
Or maintain session messages meta obj that tracks size/counts and only call pruner if needed.
@devteamdrew Opus 4.7 peak intelligence is higher than its predecessors, but its lows are worse with making basic errors (even early in context).
It's also quite prone to get stuck adding verbose meta-commentary/narrative.
My guess: security blunting nerfed its intelligence.
Hard to trust.