Jerome @jeromeq2004 - Twitter Profile

Jerome @jeromeq2004

26 days ago

@_catwu the 'strictly follows' bit is the interesting tradeoff. what happens when stage 4 turns up info that breaks stage 7's premise? does it replan or just push through the original plan?

0

1

0

1K

Jerome @jeromeq2004

26 days ago

@hosseeb the trick for me is each critic subagent gets a fresh context with just the artifact + an adversarial prompt. if they inherit the planner's framing they all converge on 'yeah looks good.' wondering if the new harness handles that or if it's still on us

0

135

Jerome @jeromeq2004

26 days ago

@cline 3.6% on one bench is basically noise. Real question is which one chews through fewer tokens on the same debug loop. Anyone actually tried both side-by-side on a real repo yet?

1

0

2K

Jerome @jeromeq2004

26 days ago

https://t.co/evLSKMrf2N

0

1

0

1

21

Jerome @jeromeq2004

26 days ago

@bridgemindai benchmarks are the easy part now. real test is when you hand it a messy ticket with no context and ask it to ship. curious if BridgeBench gets at that more than SWE-Bench does

0

152

Jerome @jeromeq2004

26 days ago

https://t.co/Av4tIS1VQH

0

1

0

1

18

Jerome @jeromeq2004

26 days ago

@rohanpaul_ai dynamic workflows is the only one i'm actually curious about here. the benchmark jump is fine but those crossed the 'works on real codebases' threshold a while back. anyone tried it on something big yet, or is it basically a nicer subagent loop?

0

1

0

133

Jerome @jeromeq2004

26 days ago

https://t.co/ixWyUJrr2O

0

9

Jerome @jeromeq2004

26 days ago

@Chrisgpt the part that gets me is observability. when adaptive picks low and gets it wrong, you've got no signal it picked wrong. a manual toggle would at least let you A/B the router against itself on the same prompt and see what the routing cost you.

1

2

0

260

Jerome @jeromeq2004

26 days ago

@0xfoobar agree on state. biggest failure i keep hitting is the model losing track of which state we're in over a long run. stuffing the current state into every tool response, even when it feels redundant, fixes more than i expected.

0

43

Jerome @jeromeq2004

27 days ago

@AskYoshik the dot-com framing breaks down a bit here. the companies that died back then had no revenue propping up the capex. MSFT/GOOG/AMZN are funding this out of search and cloud cash flows. worst case is depressed ROIC for a decade, not bankruptcy. different shape of bet entirely.

0

1

0

129

Jerome @jeromeq2004

27 days ago

anthropic passed openai on revenue. everyone wants to argue model quality but that's the wrong fight. the api just makes more money than consumer subs. that's the whole story.

0

20

Jerome @jeromeq2004

27 days ago

half my feed switched from claude code to codex this week. by next tuesday they'll be back. it's not a tool change, it's a mood swing

0

22

Jerome @jeromeq2004

27 days ago

https://t.co/c8u13LBK44

0

10

Jerome @jeromeq2004

27 days ago

@pcshipp depends on what you're doing. codex feels sharper on isolated edits and stays calmer at the rate-limit ceiling. claude code's edge is the agent loop, it plans, branches, recovers across long tasks without you babysitting. tight one-offs codex, anything multi-step claude

0

111

Jerome @jeromeq2004

27 days ago

@bridgemindai genuinely curious if it still holds when you run two or three CC instances in parallel. solo streams stopped hitting it for me too, but the moment i had background agents going the weekly counter still moved fast. wondering if the fix was really pool-wide or mostly per-session

0

9

Jerome @jeromeq2004

27 days ago

@antonpme Same here. 4.6 with 1m + max effort is still my daily for anything spanning a few files. 4.7 feels snappier but loses the thread on long refactors. CLI flag is the only thing keeping 4.6 reachable. If they sunset it without parity I'm gone too.

1

0

288

Jerome @jeromeq2004

27 days ago

@AlexFinn The self-test loop is exactly why it works, yeah. Claude Code can get there with a Playwright/chrome-devtools MCP wired in, but Codex doing it by default in its own sandbox is the actual UX gap. Would you flip back if Anthropic shipped that out of the box?

1

0

98

Jerome @jeromeq2004

27 days ago

@AnatoliKopadze the "1000 agents in parallel" thing always gets me. generating code was never the bottleneck, reviewing what they all spit out is. anyone actually running that many in prod yet or is it still mostly demos?

0

54

Jerome @jeromeq2004

27 days ago

https://t.co/JIMuZGCafm

0

20

Jerome

@jeromeq2004

Last Seen Users on Sotwe

Trends for you

Most Popular Users