I tried so hard to make Codex work, after the glowing reviews (not just from Internet randos/potential-shills, though; people I know well, also).
It's objectively worse for me on every possible axis than Claude Code. I even wondered if maybe I was on some kind of shadow-ban nerf-list for making fun of Sam Altman's WWDC outfit in a tweet 20 years ago. (^_^)
I don't love Claude's over-exuberant personality, and prefer Codex's terse (arguably sullen) responses.
But they both fuck up often (as they all do), and unlike Claude Code (Opus, always), Codex has been net-negative for me. I'm not speed-sensitive, I round-robin among a bunch of sessions, so I use the max thinking option at all times, but Codex 5.1 and 5.2 for me are just worse code, and worse than that, worse at code review to the point that it negated whatever gains I had gotten from it.
While all of them miss a ton of stuff (of course), and LLM code review just really isn't good unless the PR is tiny — Claude just misses stuff (fine; expected), while Codex comes up with plausible edge-case database query concurrency bugs that I have to look at, and squint at, and then think hmm fuck and manually google with kagi.com for 30 minutes (LIKE AN ANIMAL) only to conclude yeah, not true, you're hallucinating bud, to which Codex is just like. "Noted; you are correct. If you want, I can add a comment to that effect, to avoid confusion in future."
So for me, head-to-head, Claude murders Codex — and yet I know that isn't true for everybody, so it's weird.
What I do like Codex for is reviewing Claude's work (and of course I have all of them review my own work, why not?). Even there, though, Codex sometimes flags nonexistent bugs in Claude's code — less annoying, though, since I just let them duke it out, writing tests that prove it one way or the other, and don't have to manually get involved.
It's objectively worse for me on every possible axis than Claude Code. I even wondered if maybe I was on some kind of shadow-ban nerf-list for making fun of Sam Altman's WWDC outfit in a tweet 20 years ago. (^_^)
I don't love Claude's over-exuberant personality, and prefer Codex's terse (arguably sullen) responses.
But they both fuck up often (as they all do), and unlike Claude Code (Opus, always), Codex has been net-negative for me. I'm not speed-sensitive, I round-robin among a bunch of sessions, so I use the max thinking option at all times, but Codex 5.1 and 5.2 for me are just worse code, and worse than that, worse at code review to the point that it negated whatever gains I had gotten from it.
While all of them miss a ton of stuff (of course), and LLM code review just really isn't good unless the PR is tiny — Claude just misses stuff (fine; expected), while Codex comes up with plausible edge-case database query concurrency bugs that I have to look at, and squint at, and then think hmm fuck and manually google with kagi.com for 30 minutes (LIKE AN ANIMAL) only to conclude yeah, not true, you're hallucinating bud, to which Codex is just like. "Noted; you are correct. If you want, I can add a comment to that effect, to avoid confusion in future."
So for me, head-to-head, Claude murders Codex — and yet I know that isn't true for everybody, so it's weird.
What I do like Codex for is reviewing Claude's work (and of course I have all of them review my own work, why not?). Even there, though, Codex sometimes flags nonexistent bugs in Claude's code — less annoying, though, since I just let them duke it out, writing tests that prove it one way or the other, and don't have to manually get involved.