How many times do you think ECC RAM has caught an error? Online anecdotes I've found indicate almost no one experiences regularly corrected errors that weren't due to imminently failing hardware.
I've managed a couple thousand servers with ECC. The vast majority had zero reported errors the whole life. Of those that reported errors, there were a few categories:
Some reported a couple errors a day for months (maybe years?) but worked fine.
Some ramped up error counts over hours or days.
Some went from zero to lots in one step.
A few managed to hit uncorrectable errors; sometimes just once.
For a small number of correctable errors (< 10/day), there was no action needed, or one uncorrectable, but that kind of failure is what drives people without ECC crazy; some of the machines that hit an uncorrectable only did it once and were fine. The other ones we'd replace ram for. A small number of daily errors or a single uncorrectable were less common than the ones that got their ram swapped. I don't know for sure if uncorrectables correlated with many correctable errors, because correctable errors were only reported hourly ... if it was a step change to bad ram, it's likely to halt before a reporting interval, so no report. Unless the correctables were several a second, the impact of corrections isn't obvious.
Why replace when the system is stable? I guess there may be an increased chance of multibit errors. But sometimes new ram is flakey or disturbing the rack causes other problems.
Is ECC a crutch? Sure. But it's hard to walk with a bum leg/bad ram, so why not have it? (Cause it's expensive is a fine reason, but if it were closer to 25% more than 100% more, it'd be easier to say yes)
Memtest86 is great, but systems change and most people aren't running memtest frequently. On my non ecc systems, I run it during setup to make sure things are good, and only later if things get crashy... but if things get crashy because of bad ram, my data may already be corrupted.
DDR5 has built in ECC too. Unfortunately, AFAIK there's no error reporting mechanism, so while it should reduce error rates, it likely increases error severity. Assuming no bitflips between the ram module and the cpu, ECC on the ram corrects any single bitflips, but multiple flips are uncorrectable and must pass through, so any incorrect value the cpu gets has multiple bitflips.
Are you under the impression that ECC is for catching software issues? This is precisely what I want ECC for: to let me know a stick of RAM is failing on me before I let it silent corrupt my fucking data for months on end until it completely dies.
I feel like userbinator is expecting that a failing stick will go from working to failing so hard you'd notice, with or without ECC; so the corruption would be time limited. My experience with ECC suggests that many, maybe most of the failing sticks probably would fit that, but some of the failing devices only threw a few errors a day for months and we continued to use them until retirement; because replacement is intrusive and a few corrected errors a day didn't hurt anything... had a non-ECC stick failed in the same way, chances are you wouldn't notice in a timely fashion.
That said, I don't run ECC in my home. I'm not willing to spend the premium in dollars, performance, or time to do it. My storage servers are all ex-desktops and I try to chase performance in a budget, ECC ram usually doesn't run at high speed and it often costs at least twice as much... that doesn't make sense for a desktop, so my servers suffer too.