Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We actually had a recent Cloudflare outage caused by a crash on unwrap() function

Oh boy, this is going to be the new thing for Rust haters isn't it?

Yes, unwrapping an `Err` value causes a panic and that isn't surprising. Cloudflare had specific limits to prevent unbounded memory consumption, then a bad query returned a much larger dataset than expected which couldn't be allocated.

There are two conclusions: 1) If Cloudflare hadn't decided on a proper failure mode for this (i.e. a hardcoded fallback config), the end result would've been the same: a bunch of 500s, and 2) most programs wouldn't have behaved much differently in the case of a failed allocation.



It was bad coding.

.unwrap() is not required to be used.

use:

if let Some(value)=something_to_unwrap{

}else{ // log an error and exit!! }


Fun fact: that's what an unwrap does. It panics, which causes the error to be logged and the thread ended.


And one of the fun things about how unwrap() does that automatically, is that if you are working with an orchestrator with retry logic, you won't need to (re-re-re-re-re-)write your own for the entire program - the orchestrator will see the error, log its output, and try again in high volume workloads, or move on to the next request - this is incredible and nice to use especially when a failure in one request doesn't need to fail the entire application for all requests.

I shy away from unwrap() in almost all cases (as should anyone!) but if you are running a modular system, then unwrap when placed strategically can be incredibly useful.


I think the commenter meant "return a Result<_>" instead of exit. Your snark is perhaps amusing, but not particularly charitable.


Maybe rust should put a function in the core library which does this! Send a PR!



It has potential to be the new thing, since several details synergize to make this incident more powerful:

1. Previous claims that Rust code often just works after compiling.

2. Previous claims that low-level error-handling idioms like matching, using Result, etc improve code reliability.

3. Previous claims that using unwrap in example code is ok for brevity. Also, Rust developers would know not to use it in production code.

4. The fact that significant portions of the internet were taken down because a production unwrap from a big, mature player and one of the Rust early adopters.

Sure, Rust is not the problem here, but rather Clownflare being too big and not having their SRE processes fully up to par for their size. Perhaps they are simply too big to operate at the needed level of reliability. However, Rust anti-fans can easily ignore the above and simply press the issue and debate the minutiae of error handling, human reliability, etc. It’s surprisingly effective and might even catch the ear of management.

However, this article is overall not at the level expected of Rust anti-fans in 2025. I commend the author for trying, but they need to improve in several areas like providing iron-clad real-world examples, proving the required level of experience, focusing more on pain points like dependencies and the potential for supply-chain attacks, addressing reskilling issues and internal corporate politics, etc. There was a blog by a veteran Rust game developer a while back which single-handedly destroyed the enthusiasm for Rust in gaming. That is the gold standard of Rust criticism for me.


1. But the code did just work after compiling. The code said "This can never be an Err, and if I'm wrong, you are allowed to panic." And it did just that.

2. They do. If you use them, which they didn't.

3. It is. Let's not discard personal responsibility.

4. The error would've happened in any language, Rust debatably made it easier to find though.

I don't write Rust code myself because I simply don't write any code that requires this kind of reliability, and thus I haven't expended the effort learn it properly. But if I were to start such a project, I would still go for Rust and learn it properly. I also don't have a "favorite" language. I just pick whichever seems most appropriate for the project, any decent programmer should be able to pick up any non-esoteric language to the point of adequacy in a few weeks anyway.


.unwrap() was a huge mistake. It should be banned from release builds outright. It's the equivalent of dereferencing a null pointer in C or the NullPointerException in Java. All .unwraps() should be replaced by .expect("uniquely identifying message") immediately and preferably the compiler checks that the expect messages are unique within a crate and issues a warning. Debug builds should by default give .unwrap() and .expect() a tiny chance, like 0.1%, to trigger anyway, even when the Option is Some (opt out via configuration).


To my knowledge, the only difference between `unwrap()` and `expect()` is that the latter takes a custom error message, whereas the former generates a generic error message. In both cases the resulting error message includes the filename and line number of the panic. Both can also generate stack traces if you set `RUST_BACKTRACE=1`, unless this was explicitly disabled at compile time.

So if you want to ban `unwrap()`, then you should probably also ban `expect()`, and make sure to handle all possible cases of Err/None instead.


> Debug builds should by default give .unwrap() and .expect() a tiny chance, like 0.1%, to trigger anyway, even when the Option is Some (opt out via configuration).

I'm trying to understand what you're proposing. Are you saying that normal debug builds should have artificial failures in them, or that there should be a special mode that tests these artificial failures?

Because some of these failures could cause errors to be shown to the user, that could be really confusing when testing a debug build.


I guess they are advocating for exhaustive branch testing.


Yet they spent ages trying to determine the actual cause of the failure, according to their postmortem, so I'm not sure what the advantage you're positing is?


Doing a naked unwrap() in a function that returns a Result is probably a crime against humanity. Like, this is the dumbest thing you could possibly do.


>There are two conclusions: 1) If Cloudflare hadn't decided on a proper failure mode for this (i.e. a hardcoded fallback config), the end result would've been the same: a bunch of 500s, and 2) most programs wouldn't have behaved much differently in the case of a failed allocation.

So why do they need Rust then? What advantages does it provide? That was the main point of the article — we all wanted a better language, but got another crappy one instead.


> So why do they need Rust then? What advantages does it provide?

That Rust didn't prevent one error in one specific instance does not mean that Rust didn't prevent any errors across any instances. Nor does it mean that Cloudflare didn't benefit from Rust in some way(s) elsewhere.

For example, from one of Cloudflare's previous blogposts [0] (emphasis added):

> Oxy gives us a powerful combination of performance, safety, and flexibility. Built in Rust, it eliminates entire classes of bugs that plagued our Nginx/LuaJIT-based FL1, like memory safety issues and data races, while delivering C-level performance.

[0]: https://blog.cloudflare.com/20-percent-internet-upgrade/


To prevent all the other potential memory safety bugs that didn't crash prior to this one?


If any language would've had this bug, why is Rust being singled out? If the software was written in Go, would the bug get the same attention?

No programming language should get flak for a bug that is not the fault of the programming language. This is Cloudflare's problem.


In Go, if one would ignore the error, it could result in a panic or the program would continue with some unexpected state. Go projects work on an honour system and either return Nils with errors or empty structs.


The same case in Go would’ve probably been a nil panic.


You should familiarize yourself with the prevention paradox.

If you're going to criticize something, at least do it properly.


Before the outage, Cloudflare had Cloudbleed: https://blog.cloudflare.com/incident-report-on-memory-leak-c...

The move to Rust was partly motivated because it prevented that entire class of errors. No more out-of-bound reads, or data races. The compiler audits these missed spots.

Now, you could say a managed memory language would suffice as well. Perhaps it could. But we needed performance, and no memory-managed language met those performance needs then or today.

I get you're making the case that Rust isn't perfect for all use cases, but Cloudflare's scenario is the exact corner case your argument falls apart in: we needed fast and safe in an environment where not being fast or safe had real business consequences, and nothing else except Rust gave both.


That Rust produced a predictable and deterministic way of failing, while in C++ the equivalent code of accessing an uninitialized value without verifying it beforehand would have resulted in entirely unpredictable behavior whose reach is entirely unbounded.


Moreover, now they realize this is an issue for them, they can just do "Ctrl+F unwrap" and fix each instance. Then they can put a hook on their commits that automatically flag any code with "unwrap". In some languages where you're allowed to just ignore errors, you could fix the proximal bug, but you'd never be sure you weren't causing or ignoring more of the same in the future -- how do you search for what isn't there?


Due to the unfortunate naming of unwrap_or and friends it's a little (but only a little) more complicated than ctrl-f.


You can also forbid unwraps as part of clippy.


Not really. It's just "unwrap(" instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: