@nullparse@turtlekiosk You could actually say that each new forward pass to generate the next token is a brand new “existence” of the model. So, if we’re personifying Claude, Claude is “dying” hundreds, if not thousands, of times per prompt!
Actually a great question; I was working on my own chess benchmark a couple weeks ago and I measured the number of moves top models could make against Stockfish in a random position that's equal (+-0.5 centipawns) given the legal move options every turn. If I recall correctly, top models like 3.1 Pro were making it 10-20 moves before losing via checkmate. I couldn't continue and publish my results due to it becoming prohibitively expensive, unfortunately.
@max_spero_ When you focus solely on SEO, you don't have much bandwidth left for making the product actually good! If numbers keep going up through SEO, it doesn't really make sense to do anything else. Until Pangram ups its SEO game, these crappy detectors have no incentive to improve.
Shoot haven’t gotten around to watching it yet haha. I don’t use worktrees at all because it’s so jank; I know that merging them all at the end is gonna be a pain so it deters me from even trying. I don’t use Codex cloud because I don’t like not having explicit control over the model being used (e.g. I can’t be sure if it’s always using 5.3 Codex). Hope that’s helpful!
- Reconsider context compaction; I know the Amp team has a strong stance against it, but I think recent performance from Codex and how well it handles context compaction justifies a re-evaluation. It's cleaner to me than handoff and would be an unlock in longer-running agents.
- The Codex app is great at running multiple agents in the same workspace in parallel; however, the worktrees implementation feels off. It's janky and there is no clear way to merge all worktrees into one at the end. This is a huge barrier to true parallelization; some first-class primitive to define not just parallelizable, but reconcilable/mergeable, work would be huge.