This blog post lacks almost any form of substance.
It could've been shortened to: Codex is more hands off, I personally prefer that over claude's more hands-on approach. Neither are bad. I won't bring you proof or examples, this is just my opinion based on my experience.
Heya, author here! Admittedly this was a quick blog post I fired off, much shorter than my usual writing.
My goal wasn't to create a complete comparison of both tools — but to provide a little theory a behavior I'm seeing. You're (absolutely) right that it's a theory not a study, and I made sure to state that in the post. :)
Mostly though the conclusion describes pretty succinctly why I wrote the post, as a way to get more people to try more of the tools so they can adequately form their own conclusions.
> I think back to coworkers I’ve had over the years, and their varying preferences. Some people couldn’t start coding until they had a checklist of everything they needed to do to solve a problem. Others would dive right in and prototype to learn about the space they would be operating in.
> The tools we use to build are moving fast and hard to keep up with, but we’ve been blessed with a plethora of choices. The good news is that there is no wrong choice when it comes to AI. That’s why I don’t dismiss people who live in Claude Code, even though I personally prefer Codex.
> The tool you choose should match how you work, not the other way around. If you use Claude, I’d suggest trying Codex for a week to see if maybe you’re a Codex person and didn’t know it. And if you use Codex, I’d recommend trying Claude Code for a week to see if maybe you’re more of a Claude person than you thought.
> Maybe you’ll discover your current approach isn’t the best fit for you. Maybe you won’t. But I’m confident you’ll find that every AI tool has its strengths and weaknesses, and the only way to discover what they are is by using them.
Hey! Didn't mean my comment negatively towards you in any way, though I now realize it might've come across as such. Blogs with opinions based on experiences alone are absolutely fine, thanks for sharing.
What I did mean is to indicate that your blog felt like a HN comment to me, where I generally expect a HN link to be news or facts that subsequently spark a discussion.
At the end of your post I guess I was hoping or expecting facts or examples, indicating it was engaging enough to read to the end.
No problem at all! I read it as a bit pithy, but I didn’t think it was particularly mean spirited.
If you check out my writing on build.ms and fabisevi.ch you’ll see that the majority of it is meant to be evergreen observations of a concept or a moment in time. My goal is to make people think and to think about thinking, more than it is to tell people what exactly to think.
If I had to summarize my style in one sentence, it would walking people to and around an idea, and leaving the rest as an exercise to the reader. Naturally, this means I have less control over how people interpret my writing so I do try and cover my bases with fact and experience, but that still means sometimes I won’t deliver a complete picture to everyone.
In that case, sometimes I come to a place like HN or Bluesky or Mastodon where my post is being discussed and try add some perspective and clarity through constructive conversation. :)
If I’m being honest, I think we’re too early in the state of generative AI as a coding tool to draw very strong factual conclusions for many of our experiences using AI to code that will hold up well. I’m not implying it’s all vibes, but I think it would be pretty hard to wrap up my post in a bow the way you’re suggesting. On the other hand I’m always open to well-considered feedback — and would love to know more about your experience if you’re interested in sharing!
That’s a long way of saying happy holidays to you as well!
Most of my AI coding experience is through Github Copilot (GHCP), mostly because that is available to me professionally. GHCP has improved greatly over the past half year in my opinion. I do use it a lot, burning up my enterprise allowance almost every month working on complex python codebases.
When it comes to models in GHCP, I vastly prefer Claude over Codex. It's not that Codex is bad, it just feels tonedeaf to me. It writes code in its own preferred style and doesn't adjust to the context of the codebase. Additionally, for me, Sonnet and Opus are much less prone to getting stuck in loops for longer or more complex agentic tasks.
I do like Codex for review tasks. When I'm working on something complex, both planning and implementation, I frequently ask Codex to review Claude's work, and it does a good job at that, frequently catching a mistake or coming up with a different angle.
I've toyed with kilocode, cline and the related forks through Claude Opus 4.5 API, but I'd argue my experience with Claude Sonnet/Opus through Copilot has just been... better. More consistent. Faster.
Sometimes I code with local models, when I'm working on highly confidential projects or data. Prefer GPT-OSS 20b or Qwen3-coder-30b then, but without an agentic harness as prompts get big and slow.
I would find it a nice read to work a case and see two models/harnesses duke it out, see whether it matches your expectations and gut feeling.
It’s funny because my use of Claude Code is the opposite. I use slash commands with instructions to find context, and basically never interact with it while it is doing its thing.
instruct it to stop and ask something sometimes when it is doing its thing. it is one of my core instructions at every level of its memory. if instructed, it will stop when it feels like should stop and in my personal experience it is suprisingly good at stopping. I’ve read here a lot of people having a different experience and opting for smaller tasks instead though…
My question was misleading. For me Claude Code appears sometimes to stop too often at a random point instead to ask instead of keeping going. I guess that is the point of the linked article that Codex works differently in this regard.
For example, I just got:
"I've identified the core issue - one of the table cells is evaluating to None, causing the "Unknown flowable type" error. This requires further debugging to identify which specific cell is problematic."
> Codex is more hands off, I personally prefer that over claude's more hands-on approach
Agree, and it's a nice reflection of the individual companie's goals. OpenAI is about AGI, and they have insane pressure from investors to show that that is still the goal, hence codex when works they could say look it worked for 5 hours! Discarding that 90% of the time it's just pure trash.
While Anthropic/Boris is more about value now, more grounded/realistic, providing more consistent hence trustable/intuitive experience that you can steer. (Even if Dario says the opposite). The ceiling/best case scenario of a claude code session is a bit lower than Codex maybe, but less variance.
Well, if you had tried using GPT/Codex for development you would know that the output from those 5 hours would not be 90% trash, it would be close to 100% pure magic. I'm not kidding. It's incredible as long as you use a proper analyze-plan-implement-test-document process.
I'm not the biggest advocate of the EU DMA, but account and device access is one item we should actually be regulating very heavily, where potential penalties for (suspected) abuse or incompliance must be much more granular than full-on account bans.
It's hard to believe EU governments are actually considering mandating iOS and Android as gateways to access government services. It's a level of ignorance that's unfathomable.
This story is also exactly why I invest precious time running a Linux machine in the basement that rclones my cloud drives locally, as well as having full local copies of my webmail contents.
> It's hard to believe EU governments are actually considering mandating iOS and Android as gateways to access government services. It's a level of ignorance that's unfathomable.
While I agree in principle, it's not so bad. If you get hit with an account ban, you just get another device to work with the government.
> It's hard to believe EU governments are actually considering mandating iOS and Android as gateways to access government services. It's a level of ignorance that's unfathomable.
There's a good reason behind this approach, even though I don't think the benefits outweigh the downsides. These apps are supposed to be the phone equivalent of the NFC chips inside of passports and ID cards, which have all kinds of encryption and verification inside of them. They have to be protected against malicious data extraction, manipulation, and other fakery.
Phones do have the ability to do that, even free ones, and even regular desktops and laptops. How they do it kind of depends on the implementation (whether you call it a "secure element", a "TPM", or a "trusted execution environment"), but they all come down to "hardware proof shows that this digital signature is not extractable or alterable". The data isn't supposed to be something you can access, like a password, but something you can only do signed reads from, like the physical ID chips.
In iOS, that part runs entirely on dedicated hardware which will refuse to run non-Apple code, which is probably the best approach. On Android, there are more options and many phones run a software version of that concept in a dedicated separate virtual machine to save cost on physical hardware. The security of that virtual mechanism relies squarely on the early boot process having been verified not to be altered by malware. That's what the Google verification library is for in this case.
This approach can work just as well on other hardware with dedicated TPMs (although a lot of free software enthusiasts will tell you those are evil contraptions designed by Microsoft to turn your unborn children into little versions of Clippy) or dedicated encryption modules. However, you'd need a common enough, accessible API for those to function. That's actually quite easy on Windows and macOS, but Linux TPM support is rather woeful at the moment, especially with how uncommon things like secure boot (even self-signed secure boot) are.
In practice, nobody is going to buy a special sort of yubikey to log into their government's tax portal. Dragging people into basic multi-factor security has been a challenge that lasted decades.
However, pretty much all citizens already have phones capable of top-of-the-line security verification. Developing a free app is a lot easier than implementing cross-platform HSM support for a novel authentication mechanism.
All of this comes at the cost of having to run vendor-approved software. That's a huge problem for a lot of HN visitors, but those people form a sliver of a fraction of the population. I'm willing to bet the EU's digital access is inhibited more by the amount of old people without cell phones than the number of people who care about free software.
I personally feel like outsourcing this kind of trust to closed source implementations of vendor blobs is a terrible idea, but it's hard to find an accessible alternative that provides even the lax security properties those blobs provide.
Something I do find lacking in discussions about these technologies is how much the EU is relying specifically on American vendors here. America has been shown to be an unreliable ally that will gladly force the EU's hand with whatever mechanism comes to mind for extremely arbitrary reasons. There is a distinct lack of European alternatives when it comes to accessible secure computing, and I'd rather see the EU invest in local alternatives than go all-in on the security promises from Apple and Google.
We must have regulation, and I support that fully. It also seems healthy to me to have an independent view on the specifics of said regulations. I mostly agree with the vision and direction of the DMA, but in my opinion it lacks specificity and clear unacceptable boundaries.
That lack of specificity, to me, is why Apple has been able to implement malicious compliance. At the same time the lack of specifics risks companies leaving the EU market in its entirety due to regulatory unclarity with high fines.
There's a difference between malicious compliance and noncompliance. The EU has generally ruled that the lack of specificity you allude to does not exist; Apple has misinterpreted things that provide specific requirements to mean something other than what they legally mean. Fines have been levied and it seems that the situation has not yet been resolved; the fines will likely grow if Apple doesn't comply.
I had fun, this is a nice idea. Would be great to expand this with a custom link that contains a list of places, with appropriate zooming, for school kids and teachers.
Haiku (because I was a first day BeOS user and I still miss that OS every day)
KDE (Daily driver and boy do I hate using Gnome)
Keepass2Android (essential, use it 20x per day)
Bottles (most robust and easy to use way to run windows games on my Linux box for me)
- Other projects:
Wikipedia (I'm don't 100% align with some of the politics internally and externally, as well as their spending on sidehustles, but regardless there's just no substitute)
ScreenScraper.fr (because I like neatly organized retro games)
- Today I learned:
Thunderbird donating to thunderbird only supports Thunderbird, so I'll start.
Internet Archive Even though some of the stuff they are doing is legally dubious, in general I'd say the initiative is a force for good. Considering support.
- What I wish I was able to support:
OpenSUSE I use this distro every day, but I don't have the time to invest in the community other than some well written bug reports and packaging feedback every now and then.
Firefox and MDN docs Oh boy do I have zero trust in Mozilla as an organisation, but the browser and the MDN docs are so fundamentally important to me. Regardless, I just can't bring myself to support the organisation with the current CEO.
What is the deal with MacOS file dialogs? A couple days ago I was trying to open a project in Cursor, and I click on "home" and my name, and then it has the directories grouped by year created. So I type in the search box, but it's now searching some other context, like the whole system or something? I don't even have tons of files/directories in my home directory "ls | wc -l" gives 36.
It's like they designed it while watching High Fidelity: "I sorted my albums autobiographically. So if I'm looking for <this album> I have to remember that it's under albums I bought for a girl but ended up not giving to her." "That sounds like a great idea!"
> then it has the directories grouped by year created
That's a setting you set.
Right click on empty space > uncheck "Use groups"
Or in that context menu, select "Show View Options" and customize it to your liking. My liking is "Group by kind" (folders to the top) then "Sort by name"
If you start searching, I think it defaults to scope "This Mac". That's probably right for most cases. If you want to open a Word doc named Fnord, you'd kind of hope Finder would... find it... wherever it was. But you can also click next to "This Mac" to switch it the context of the directory you're in.
Also, cmd-shift-G (the Finder shortcut for "Go to Folder...") will let you start typing a path.
Sounds like it was sorted by “most recent” (not the column, but the view mode).
That said the Open dialog is a sad sack stand in for even the flawed Finder. 20 year Mac user here: I developed the muscle memory to just have a Finder window open to the file I want so I can drag and drop from that into the Open dialog.
Thanks everyone, these pointers were really helpful. At one time I think I set it to "most recent" and forgot about it, since I'm not using the finder a whole lot. I hadn't even thought about going into settings, so I went there and did some tweaks, including defaulting to searching within folder by default, and changing some locations that it defaults to, which I think will really help.
It does seem like there are more well made, premium apps on Mac. I’ve always been jealous. Eventually a clone is made that supports Windows, but it always seems a bit worse.
I ended up with ForkLift after much trial and error. Commander One was nice. Double Commander is also great but not "native" on Mac. Path Finder is super powerful but has a rep for being overcomplicated and also crashy, but I can't personally vouch because it wasn't quite what I was looking for anyway.
Forklift is the one I settled on as well. I had the experience you describe with Path Finder before and finally I gave up.
Forklift has a couple of things that annoy me daily though. Often I will have to refresh a pane to see a file I know has recently been added. Eg in downloads. I may even have navigated to downloads after the download finished and it's still not visible until I refresh.
The other is that it doesn't reuse existing tabs if I "reveal in finder" or whatever, so after a while there's a million tabs open, most pointing to the same directory.
On macOS my daily driver is Nimble Commander (https://magnumbytes.com/). Super fast, powerful and inspired by Total Commander. It used to be paid but now is free and open source so give it a try. It deserves to be better known.
It’s possible to get pretty close. For example, Forklift’s instructions (go to https://binarynights.com/manual, search for “Default File Viewer”) nearly replaces it, except you still have a Finder icon in the dock.
I just made an account here to say thank you. Rancher desktop is amazing on my M2 mac, it's easy to use, solves a bunch of challenges from k8s to docker and has been surprisingly reliable for a young product.
It could've been shortened to: Codex is more hands off, I personally prefer that over claude's more hands-on approach. Neither are bad. I won't bring you proof or examples, this is just my opinion based on my experience.