@plainionist used both vanilla. The instant I switched to codex it spotted the issues and problematic code. Also used code simplifier skill and it found a lot of slop the claude did. I find that is happening less in codex (still do but less)
@plainionist I had a week lost due to Claude ensuring me what's it doing is correct. It created tests that don't do anything and relied on them for refactoring which then destroyed my code slowly but surely. Since gpt 5.3 codex never had an issue
@mark_k It's not even engineering. It's 2 tasks. Unrealistic. It's doesn't provide any proof of what the model can do. Maybe a 100 different ones would. Maybe.