Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
It is time to standardize principles and practices for software memory safety (acm.org)
76 points by mepian 10 months ago | hide | past | favorite | 98 comments


Makes sense, good luck! I know that sounds snarky, I'm looking forward to rational progress and cooperation on the evolution and adoption of the standard. Just haven't seen that played out in such a planned orderly fashion yet (ipv6?).


ipv6, unicode, usb...

Why am I more worried than excited about a new standard?

By the way bounds checking was introduced in Turbo Pascal in 1987. Iirc people ended up disabling it in release builds but it was always on in debug.

But ... it's Pascal, right? Toy language.


Bounds checking exists at very least since JOVIAL in 1958, or if you consider FORTRAN compilers have add an option for bounds checking for quite some time, 1957.

Here is my favourite quote, every time we discuss bounds checking.

"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

-- C.A.R Hoare's "The 1980 ACM Turing Award Lecture"

Guess what programming language he is referring to by "1980 language designers and users have not learned this lesson".


> But ... it's Pascal, right? Toy language.

Not really.

It's just out of fashion. But there are really high quality current day implementation, like the one from Embarcadero (i think they acquired Borland a while ago?): https://www.embarcadero.com/products/delphi/features/design


I think nottorp was being a bit sarcastic. I think the point was, if Pascal, which some in the C/C++ world regard as a "toy" language, had this in 1987, maybe we can actually think about having it in "real" languages in 2025.


my bad, i might have missed the sarcasm then :)


I heard algol had bounds checking somewhere in 60s as an implementation feature. Reportedly customers liked it a lot that the programs don't produce wrong results faster.


Pascal being derived from Algol, it makes a lot of sense.


Yeah, I’m wondering what this even means. I’m assuming they’ll have to define “memory safety” which is already quite the task. Memory safe in what context? On what sort of machine? What sort of OS?


> On what sort of machine? What sort of OS?

Just sharing an anecdote: recently, I had to create Linux images for x86 on ARM machine using QEMU. During this process, I discovered that, for example, creation of initrd fails because of memory page size (some code makes assumption about page size and calculates the memory location to access instead of using system interface to discover that location). There's a similar problem when using "locate" utility. Probably a bunch more programs that have been successfully used millions, well, probably trillions times. This manifests itself in QEMU segfaulting when trying to perform these operations.

But, to answer the question: I think, one way to define memory safety is to ensure that the language doesn't have the ability to do I/O to a memory address not obtained through system interface. Not sure if this is too much to ask. Feels like for application development purposes this should be OK, and for system development this obviously will not work (someone has to create the system interface that supplies valid memory addresses).


I think the usual context just requires language soundness; it doesn't depend on having an MMU or anything like that. In particular, protection against:

- out-of-bounds on array read/write

- stack corruption such as overwriting the return address

It doesn't directly say "you can't use C", but achieving this level of soundness in C is quite hard (see sel4 and its Coq proof).


Everyone picks on C, but we have a standard for this. We've been following it for decades in regulated industries. If people take the time, it can be perfectly safe. It requires thinking of a computer as a precision machine, rather than a semantic "do what i'm thinking" box.


The problem is that people are really bad at that kind of precision.


Maybe I lack vision in such matters, but: how would you corrupt the stack without an out-of-bounds write?

But there's another aspect that I think you missed: use after free.

As you say, achieving this level of soundness with C is hard. Proving it is much harder. (Except, how do you know you've achieved it if you don't prove it?)


I suspect seL4 could be proven correct only because it uses simple lifetime patterns.


Yet that is not what memory safety means. A program being memory safe or not depends on its actual behaviour not what you can prove about that behaviour. There are plenty of safe C programs and plenty of unsafe ones. Proving something is safe doesnt make it safe.

Also these properties are a very small subset of general correctness. Who cares if you write a "safe" program if it computes the wrong answer?


> Proving something is safe doesnt make it safe.

Err .. that is actually the point of the proof. Can you give an example of something with a Coq-type safety proof that has a memory safety bug in it?


Not OP but you can in theory add cosmic rays, rowhammer attacks and brownout/undervolt glitching into the mix. Kinda stretching it but sometimes you have to think about these.


Reread my comment. You are confusing proof and fact.


One of the most under appreciated things about the JVM is its well-defined memory model.


Unfortunately immediately undermined by the decision to not address nullability out of the gate. It's fantastic that null pointer exceptions are well-defined on the JVM - but they still tend to bring your program to a screaming halt.


I believe that the parent is talking about how Java, and in turn the JVM, defines its memory model, which describes formally how memory reads and writes occur in the presence of multiple threads of execution. For example, data races are well-defined in Java: Either you read the old or the new value, you'll never read a mix of two values.

Having a well-defined memory model is important when running on infra with very different models, such as x86 and aarch64.


[flagged]


No reason to be rude, it was really not clear that you knew that as null pointers have nothing to do with memory models.


Is it a big difference if you have ValueNotPresentException from nullable unwrap instead of NullPointerException?

Oh, wait, it exists https://docs.oracle.com/javase/8/docs/api/java/util/Optional...


The problem is basically every type is implicitly Optional and basically every operation implicitly unwraps, instead of only the cases where nullability is actually desired.


The big difference is that Optional has ergonomic features like .map(), .ifPresent(), .orElse(), that reduce the verbosity of repeated if blocks checking if values are present or not.


After the log4j vulnerability, I’d say that despite having good memory safety I would say the JVM’s powerful serialization primitives make me pretty leery of it as far as security goes.


The log4j vulnerability happened because the log4j programmers made an overly generalized "do anything and everything" software. The kind of architecture that is made to be so generic that the software accidentally gains emergent properties (=not thought of/realized/considered execution paths and interactions). Although the desire to write software like that might have arisen under the influence of the object oriented mindset, I'm sure it could have happened in any other language.


> I’d say that despite having good memory safety I would say the JVM’s powerful serialization primitives make me pretty leery of it as far as security goes.

That's a sound deduction. The JVM had such a vulnerability because it's full of ambient authority and rights amplification patterns. More such vulnerabilities probably exist, they're just hard to see.


You can do the same thing in Rust as Log4J.

I haven't tested this code, and definitely don't do this.

This code is intended to lets attackers run any shell command by sending JSON with a "debug_command" field - similar to Log4J, it's a "feature" being misused rather than a memory bug that Rust would catch.

      ```rust
   use serde_json::Value;
   use std::process::Command;
   
   fn process_log_entry(log: &str) {
       //  UNSAFE: Allows command injection
       if let Ok(json) = serde_json::from_str::<Value>(log) {
           if let Some(cmd) = json.get("debug_command") {
               Command::new("sh")
                   .arg("-c")
                   .arg(cmd.as_str().unwrap_or(""))
                   .output()
                   .unwrap();
           }
       }
   }
   ```


I feel like Rust has a different development culture than Java and Java devs are more likely to want to build some abstraction that does everything and loads classes from the network into the runtime.


That's not the same thing. You can call a shell command from any language. The log4j problem was that you could load arbitrary classes from the internet into the memory of the current process, which is a much more severe problem.


I am aware, but I wanted to illustrate the higher level idea of architecture issue vs memory issue.

To keep it concise, I had take some liberties.

If you have more time than me, please feel free to reproduce Log4J more accurately in Rust.


If you can run a shell command, it can do basically anything you want.


Sure, in a general purpose language, like Java, or Rust, or C++ you can indeed do "basically anything you want" that's why it's called general purpose, your purpose might be to run arbitrary code you found on the Internet, so, that's a thing you can do. If you can't it's not general purpose.

In a number of applications this means you do not actually want a general purpose language which is why WUFFS makes sense.

But, even when you don't have that constraint it's reasonable to ask: How easy was it to make a thing you didn't intend, by accident ?


The log4j vulnerability was due to Java code, not in the JVM. But I get that people mostly conflate the two.


I know this article is more a buisness case presentation than a full demonstration of the field but the TR also misses some points.

Why remove the refrences in the TR to frama-C, cbmc, etc. from the opinion report? They are easier to adopt than the heavier tooling of coq, etc. I'm always suprised to see those tools ignored or downplayed, when it comes to these discussions. I do agree with the TR's sentiment that we need to improve accessibility and understanding of these tools, but this is not a great showing for that sentiment.

Additionally, both articles miss that compiler modified languages/builds are a path, such as fbounds-safety. They will be part of the solution, and frankly, likely the biggest part at the rate we are going. Eg. current stack defenses for C/C++/Rust, unaddressed in safe language design, are compiler based. The compiler extension path is not particularly different than cheri, which requires a recompile with some modifications, and the goal of both approaches is to allow maintainers to band aid dependencies with minimal effort.

The TR handwaves away the question of the complexity of the development of formal method tools for Rust/Unsafe Rust and C++. Ie. rust really only has two tools at the moment: miri and kani (which is a cbmc wrapper). Heavier tools are in various states of atrophying/development. And C++ support from the c family of formal tools such as frama-C, is mostly experimental. It's not assured, that with the continued language development rate of both languages and the complexity of the analysis, that the tools for these languages will come forth to cover this gap anytime soon.

I do not think the claim in the TR that the current unsafe/safe seperation will result in only requiring formal analysis of the unsafe sections is true, as logical errors, which are normal in safe rust, can cross the boundries to unsafe and cause errors, thus nessecitating whole program analysis to resolve if an unsafe section could result in errors. Perhaps it will decrease the constants, but not the complexity. If rust does further restricts perhaps more of the space could be covered to help create that senario, but the costs might be high in both usability and so on.


Is it a hot take to believe that no humans are infallible and that only languages with memory safe guarantees can offer the kind of safety the author seeks? With the advent of rust, c and c++ programmers can no longer argue that the performance tradeoff is worth giving up safety.

There are, of course, other good reasons to choose c and c++ over rust. And of course rust has its own warts. just pointing out that performance and memory safety are not necessarily mutually exclusive


I'm not sure what you're definition of performance parity is. Are you claiming that the existence of rust proves that there is no performance penalty for memory safety? The penalty may be relatively small, but I am not aware of any proof that the penalty is non-existent. I am not even sure how you could prove such a thing. I could imagine that C and C++ implementations of exactly the same algorithms and data structures as are implemented in safe rust might perform similarly, but what about all of the C and C++ implementations that are both correct and not implementable in safe rust? do they all perform only as well or worse than rust?


1. Those fast algorithms that can't be implemented in safe Rust are rare.

2. Even when they exist, Rust lets you use unsafe code but only where needed. It's still much better than having your entire program be unsafe.

3. In practice Rust versions of programs are as fast, if not faster than C/C++ ones.


As an example, a really fast sort can't be expressed in safe Rust

However, the two sort algorithms in Rust are safe to use, as well as being faster than their equivalents in C++. In fact even the previous sorts, the ones which were replaced by the current implementations, were both faster and safer than what you're getting in C++


Do you happen to have a link for benchmark ? i would like to learn what i miss happening in rust. Last time i read about sort implementations here on HN [0] rust panic safety had some measurable costs: 0. https://news.ycombinator.com/item?id=34646199


Take a look at these results:

https://youtu.be/rZ7QQWKP8Rk?t=2054

Watch until the next slide - it shows a comparison of a port of a very fast C++ sorting algorithm to Rust. Rust is faster due to algorithmic changes. Ignoring those they are very similar speeds; certainly not an issue.


Non video version is there [0]. My take on it is that there are no 'language comparisons' there. Difference is between older and newer algorithms and benchmark favorite is written in C. Its cool that new algorithms are implemented in new-ish language first.

[0] https://github.com/Voultapher/sort-research-rs/blob/main/wri...


Ah, now I think I see the misunderstanding. When I said "equivalents" I mean that these are the standard library sorts, and so I was comparing against the standard library sorts in the three popular C++ implementations.

You're correct that if you implemented these algorithms carefully in C++ you can expect very similar results. I don't believe that anybody has done that and certainly there is no sign the three major implementations would attempt to switch to these algorithms for their standard library sorts.

In Rust today the standard library sorts provided are further refinements of the "ipnsort" and "glidesort" algorithms described in the paper you linked. As the papers arguing to use these algorithms point out, the downside is that although they've been tested extensively with available tools we can't actually prove they're even safe, the upside is of course performance.


That version is 2 years old. Rust has had a new faster sort implementation since then.

> My take on it is that there are no 'language comparisons' there.

Watch the video a few minutes forwards from when I linked, there are "language comparison" slides. Basically C++ and Rust are on par.


Thanks! Updated results are interesting.


That assumes that people know what they're doing in C/C++, I've seen just as many bloated codebases in C++ if not more because the defaults for most compilers are not great and it's very easy for things to get out of hand with templates, excessive use of dynamic libraries(which inhibit LTO) or using shared_ptr for everything.

My experience is that Rust guides you towards defaults that tend to not hit those things and for the cases where you really do need that fine grained control unsafe blocks with direct pointer access are available(and I've used them when needed).


Is there a name for a fallacy like "appeal to stupidity" or something where the argument against using a tool that's fit for the job boils down to "All developers are too dumb to use this/you need to read a manual/it's hard" etc etc?


I think there is something to be said about having good defaults and tools that don't force you to be on every last detail 100% lest they get out of control.

It also depends on the team, some teams have a high density of seasoned experts who've made the mistakes and know what to avoid but I think the history on mem vulns show that it's very hard to keep that bar consistently across large codebases or disperse teams.


This is ultimately the crux of the issue. If Google, Microsoft, Apple, whatever, cannot manage to hire engineers that can write safe c/c++ all the time (as has been demonstrated repeatedly), it’s time to question whether the model itself makes sense for most use cases.

Grandparent can’t argue that these top tier engineers aren’t RTFM here. Of course they are. Even after the manual reading they still cannot manage to write perfectly safe code. Because it is extremely hard to do


Personally my argument would be the problems at the low level are just hard problems and doing them in rust you'll change one set of problems of memory safety to another set of problems probably of unexpected behaviour with memory layouts and lifetimes at the very low level.


It's not that all developers are dumb/stupid. It's that even the smartest developers make mistakes and thus having a safety net that can catch damaging mistakes is helpful.


Yes. Even the most seasoned programmers write CVE worthy c++. The foremost engineers still fail.


I've read several posts here where people say things like "this is badly designed becausw it assumes people read the documentation".

???????

Yes you need to read the docs. That is programming 101. If you have vim set up properly then you can open the man page for the identifier under your cursor in a single keypress. There is ZERO excuse not to read the manual. There is no excuse not to check error messages. etc.

Yet we consistently see people that want everything babyproofed.


_When_ there is a manual.

On the other hand, there's no excuse for designers & developers (or their product manager, if that's the one in authority) not to work their ass off on the ergonomics/affordance of the tools they release to any public (be it end users or developers, which are the end users of the tool makers, etc.).

It benefits literally everyone: the users, the product reputation & value, the builders reputation, the support team, etc.


Implying documentation exists. You're supposed to read the code, not man pages.


Yes, you need to read the docs. Yes.

And yet...

Do people read the docs? Often, no, they don't. So, are you creating tools for the people we have, or for the people you think we should have? If the latter, you are likely to find that your tool makes less impact than you think it should.

Computer languages are not tools for illiterates. You need to learn what you're doing. And yet, programmers do so less than we think they should. If we don't license programmers (to weed out the under-trained), then we're going to have to deal with languages being used by people who didn't read the docs. We should give at least some thought to having them degrade gracefully in that situation.


Nah, rust also guides you to "death from a million paper cuts" aka RAII (aka everything is singularly allocated and free'd all over the place).

You need memory management to be painful like in C so that it forces people to go for better options like linear/static group allocations.


I assure you that people do not go for better options


Why is RAII bad?


RAII is fine when it is the right tool for the job. Is it the right tool for every job? Certainly there are other more or less widely practiced approaches. In some situations you can come up with something that is provably correct and performs better (in space and/or time). Then there are just trade-offs.


Because it's micromanagement.


Micromanagement how?


Once you know how Rust works it is likely your Rust code will be faster than C/C++ with less effort. I can say this because I was using C++ for a long time since Visual C++ 6.0 and moved to Rust recently about 3 years ago.

One of the reason is you get the whole program optimization automatically in Rust while C/C++ you need to use put the function that need to be inline in the header or enable LTO at the link time. Bound checking in Rust that people keep using as an example for performance problem is not actually a problem. For example, if you need to access the same index multiple times Rust will perform bound-checking only on the first access (e.g. https://play.rust-lang.org/?version=stable&mode=release&edit...).

Borrow checker is your friend, not an enemy once you know how work with it.


This kind of assumes old and naive C++. There was a lot of that 20 years ago but a lot of that was replaced by languages with garbage collectors. New C++ applications today tend to be geared toward extreme performance/scale. The idioms are far beyond thinking much about anything you mention.

People seriously underestimate how capable and expressive modern C++ metaprogramming facilities are. Most don’t bother to learn it but it is one of the most powerful features of the language when it comes to both performance and safety. The absence of it is very noticeable when I use other systems languages. I’m not a huge fan of C++ but that is a killer feature.


OK, but what is this wonderful subset of C++ that is geared towards extreme performance without sacrificing safety, and has expressive metaprogramming facilities that to do not tank compilation and run times?

Not a rhetorical question, I'd love to see a book or notes that carves out precisely that subset so those of us who want to learn can avoid the tons upon tons of outdated or misleading documentation!



> not implementable in safe rust

This is moving the goalposts. "Safe rust" isn't a distinct language. The unsafe escape hatch is there to make sure that all programs can be implemented safely.


It is not moving the goalposts. The parent that I replied to said "c and c++ programmers can no longer argue that the performance tradeoff is worth giving up safety." If you don't limit to safe rust you are giving up safety.


> If you don't limit to safe rust you are giving up safety

This is at best a misunderstanding of the way rust works. Unsafe is a tool for producing safe abstractions.


> Unsafe is a tool for producing safe abstractions.

I think we disagree on what "giving up safety" means, or perhaps you thought I meant "you are giving up all safety." (And honestly, I'm just trying to clarify what I meant when I read/wrote it. I'm not going for a No True Scotsman, or trying to move the goalposts here.)

Manually convincing yourself (proving) that an implementation is correct is how you write correct code in any language. In this sense you never "give up safety" in any language, but that's clearly not the sense that is being discussed in this thread. In this thread "giving up safety" appears to me to mean giving up automated safety guarantees provided by the language and compiler.

I acknowledge that it is possible to write just the bare minimum in unsafe rust to realise an abstraction, and that these "unsafe rust" fragments may be provably safe thus rendering an entire abstraction safe. This may be best practice, or "the way rust works" as you say. None the less the unsafe fragments are not proved safe by construction/use of safe rust and/or automatically safe by virtue of the type system/borrow checker.

My point was that if you use unsafe rust you have reduced the number of automated safety guarantees. It is on the developer to prove safety of the unsafe rust, and of the abstraction as a whole. Needless to say, human proof is a fallible process. You may convince yourself that you have not given up safety, but I argue that you have merely contained and reduced risk. You have still "given up safety."


Safe Rust often performs significantly worse than C++ for many kinds of code where you care a lot about performance. You can bring that performance closer together with unsafe Rust but at that point you might as well use C++ (which still seems to have better code gen with less code). Everyone has their anecdotes but, with the current state of languages and compilers, C++ still excels for performance engineering.

The performance tradeoff is not intrinsic. Rust’s weakness is that it struggles to express safety models sometimes used in high performance code that are outside its native safety model. C++ DGAF, for better and worse.

The hardcoded safety model combined with a somewhat broken async situation has led me to the conclusion that Rust is not a realistic C++ replacement for the kinds of code where C++ excels. I am hopeful something else will come along but there isn’t much on the horizon other than Zig, which I like in many regards but may turn out to be a bit too spartan (amazing C replacement though).


> a somewhat broken async situation

Isn't Rusts's async situation "somewhat broken" in the exact same way the C++'s async situation is?


C++ often perform significantly worse than assembly for many kinds of code where you care a lot about performance. You can bring that performance closer together with bits of ASM in your C++ but at that point you might as well use ASM.


The word is C++ performance comes from asm-like simd integration that can be not as mature in other languages.



You are very unlikely to hit this bug in a real world Rust project while C/C++ you can easily hit by a memory safety bug.


Exactly, and also MIRI catches all of these, so with a tiny little extra effort world is in order again.

Moreover, if I remember correctly, they all are made possible by a single (long-standing) compiler bug that eventually will be fixed.

Previously discussed: https://news.ycombinator.com/item?id=39440808

I think this mindset is the big difference. We're not perfect, but we're working on it.


The bug used by that repository [1] isn't the only one that can be used to escape the Safe Rust type system. There are a couple others I've tried [2] [3], and the Rust issue tracker currently lists 92 unsoundness bugs (though only some of them are general-purpose escapes), and that's only the ones we know about.

These bugs are not really a problem in practice though as long as the developer is not malicious. However, they are a problem for supply chain security or any case where the Rust source is fully untrusted.

[1]: https://github.com/rust-lang/rust/issues/25860

[2]: https://github.com/rust-lang/rust/issues/57893

[3]: https://github.com/rust-lang/rust/issues/133361


It's time to stop jailbreaking/rooting, fully take control away from the user and enforce DRM more strongly?


The ability to jailbreak should be a legal right. We shouldn't be relying on vulnerabilities just to own the devices we bought.


Good luck getting those in power to agree to that.

Meanwhile, everything else passes under the guise of "safety and security".


It’s time to stop governments from hacking people’s phones, taking away their privacy?

Users should have control and trust in their devices. If they can be remotely compromised, they cannot get that.


The governments will always have a way in if you're a target.

Meanwhile the rest of the population gets pushed towards authoritarian dystopia.

Users should have control and trust in their devices.

I think that already went away once they started adding spyware ("telemetry" being the usual euphemism) and forced automatic updates ("remotely compromised", as you put it.)


I fundamentally don't buy the argument that our products need to be shitty so that we can break them.


Nothing to do with the article and flame bait?


It's either one or the other, right? In the end people will be forced to either fix their governance or embrace the chaos. At least in the idea of "verifiable computing for thee but not for me" it's certainly not the "verifiable computing" part that I find to be the problem.

We're living the inbetween and people are beating the drums that drive us towards either endpoint. Not for no reason either. Turbulent times.


Just bang out a bunch of C code, feed it to an AI: "Make this memory safe". Profit.

No need for rust, Ada, CHERI, SPARK, etc.


You could also pray, that's about as likely to be effective.


A rewrite isn't strictly necessary. It should be enough if AI can find errors, doesn't even need to be very precise.


Profit from your AI-powered security company, sure. But the exploit authors are profiting too.


Now the billion dollar question, how to make that work for the entire linux kernel.


If it is too big just zip it and feed chunks of the resulting zipfile to the AI. AI can do anything, right?


It will work if you convert zip chunks to base64 first, and use a large enough training set.


Should work fine if the training set is a memory-safe, full-featured, monolithoc, Unix-like operating system kernel originally written by a certain Linus Torvalds.


Can't wait to read the first incendiary linus style rant being generated by the AI as a result.

"As a large languge model, I can't answer your patch merge request you absolute fucking moron"


Easy, triple the hardware requirements and don't talk to any hardware because if you do you'll have to mess with buffers in a non approved way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: