@Undeference@kai_fell@Jonathan_Blow Interesting. This study itself basically proves that it's own results are not generally useful unless you're using ChatGPT4o and I assume with the weightings tested at that time. Perhaps a tool that performs instantaneous benchmarks across all models would be quite useful.
@kai_fell@Jonathan_Blow It kinda actually is. Maybe not the "imagine" part. But priming the model's disposition and personality traits in a prompt can result in measurably improved outputs.
https://t.co/FoKTVAUnEo
@Jonathan_Blow I would use codex (chatgpt's coding model), it still works as a language model, but is stricter and more literal in following instructions thoroughly
@bettercallsalva Interesting, Is that because of the larger context window? .. I've found that very detailed prompting (priming its mindset) is needed to prevent drift and breaking work down into chunks that won't exceed multiple context windows is essential.
Been working with Claude Opus and Sonnet 4.6 and 4.7 for about 4 months, and I've consistently been finding the value of a cross model deliberation workflow. Codex is also consistently better at converging on evidence based solutions than Claude.
@stelloprint@kunchenguid@theo Ah, ok. Wezterm behaves as the frontend to my custom agent-agnostic wrapper, so it would be hard to replace it without lua
@kunchenguid@theo yeah, i've been trying to avoid building tools and workflows that depend on third party plugins and solutions. does it work well for you though?
@lliu54827@heygurisingh totally! by the way, do you know of any good ways to run slash commands in claude programmatically, like from a hook for example?
@kunchenguid@theo do you have scripts attached to opencode hooks? I didn't see any native hook support. I'm playing with openclaude, but so far, codex and claude cli offer native support. (you can override anthropic base url and point claude cli to deepseek or other compatible models
@AishwaryaDevv Build your own. There is way more power in agility with the wild fluctuations right now, don't lock yourself in to any ecosystem yet. We're in a bubble right now. What works for me now: AI-agnostic CLI wrapper with custom built Wezterm interface. Use hooks for enforcement
@examaddaorg@X F) Takes way too long to produce mediocre at best, more often, buggy code.
Codex has both speed and precision. Claude's output quality does not justify the time spent.
@lliu54827@heygurisingh true, hooks are the way. hard-block actions, use well written error messages as guidance to steer claude. hard-block superpowers skills and override with local versions to chain them in workflows. a single correct skill use like brainstorming ignites the workflow engine
@keithwhor Claude Opus 4.7 has been really idiotic lately. Been trying to debug an issue with it for hours, and it just leads itself down a random goose chase following its own red herrings. I ask it to run it past codex and in one sweep codex obliterates Claude's entire belief system.