Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Null References: The Billion Dollar Mistake (infoq.com)
93 points by wheresvic3 on Jan 11, 2020 | hide | past | favorite | 150 comments


Out of all possible gotchas in programming languages I still find null pointers the easiest one to discover and fix. You directly see when and where it happens, and the fix is usally straightforward.

Compared to that invalid pointers (stale references) are a lot more painful, since programs might continue to work for a while. Managed languages do at least prevent those.

Multithreading issues are imho the biggest pain points, since they are introduced so easily and often go unnoticed for a long time. The amount of languages that prevent those is unfortunately not that big (Rust plus pure singlethreaded languages like JS plus pure functional languages).


You directly see where the null dereference happens. But that's not necessarily where the problem actually is, because a null pointer can flow through a lot of code before it actually gets dereferenced. So "program continues to work for a while" is also a thing with them.

In a language like C, a null pointer can also become an invalid non-null pointer pretty easy with pointer arithmetics.


Easy, just add "if (thing == null) return;" to the top of the function that crapped out on a null reference and close the ticket! /s


Sarcasm aside, this is often how it gets fixed in poor quality codebases! And if it returns a reference itself, it ends up being:

   if (thing == null) return null; 
more often than not. Which leads to an even more entertaining debuggin trip next time...


Oh for sure. I wish I would have thought of that sarcastic quip by being creative, but in reality it's because I've seen it so many times.


>a null pointer can also become an invalid non-null pointer pretty easy with pointer arithmetics.

Yeah but even then it's still easy enough to see what happened when you have a pointer to address 0x0000002F or some similar small pointer.


> You directly see when and where it happens, and the fix is usally straightforward.

This is not true in most dynamic languages, especially ones where I/O is not typed. You have to be extremely dilligent about verifying input. JavaScript comes to mind.


> You have to be extremely dilligent about verifying input.

That's true of all languages. Null references are a problem of low effort development. Calling it a billion dollar mistake is sensationalist hand-wringing. It accidentally highlighted how carelessly most programs are written, implying that without it developers wouldn't be checking inputs as strictly, because they wouldn't need to. Yes it's another type, but lots of languages have a nil/null and there hasn't been a demonstrative reason to pull it.


> there hasn't been a demonstrative reason to pull it" is not the reason it's not "been pulled

Languages are hard to change and backwards compatibility is paramount. Hell, some languages support null just for interoperability (i.e. Scala) when they would have otherwise not allowed it when they were created.

Null isn't expressive and is historical baggage. At this point "billion dollar" is probably an understatement.

I wonder how many people that have spent significant time writing in languages that allow null and those that don't prefer having null?

I, for one, wouldn't willingly go back to a language that allows null.


Lots of languages have several ways to indicate an invalid value.

With SQL, I generally prefer dialects where an empty string and null are considered the same, although that may be just due to what I learned first. Various Microsoft technologies seem to tend towards multiple ways of expressing null/missing/empty.

An interesting thing about null in SQL is that the rule that any operation on a null returns null, only applies sometimes in some contexts. For instance in Oracle SQL:

select max(case when x = 'ABC' then 'ABC' end) as y from ...

...is taking the maximum of an expression which is null whenever x <> 'ABC', yet it doesn't return null if there are rows where it does = 'ABC'.

(sorry for any errors)


Well to be clear, most modern languages with reasonable type systems will force you to explicitly verify the type of your input. C#, for example, forces you to cast your JSON before using it. If the cast fails because you got the class definition wrong, you get an error (like a constructor error).


It's always easy to find where they occur. But it's not easy to find why. Its better to never allow the model to be invalid, and error immediately when it becomes invalid rather than when it causes some surprising undefined behavior later and you have to spend the effort to reason why.


Count me amongst those who do not think they're a mistake. You need to indicate no-data-here in some fashion. If you try to use that no-data in some fashion having your program blow up from a null reference is a feature to me--in the vast majority of cases it's better go boom than silently continue doing something wrong. In the few where that's not the case you can trap the exception and go on.

The real solution is what has been done with C# in recent years--have the compiler track whether a field can contain a null or not and squawk if you try to dereference something you haven't checked. That causes it to blow up in the best place--compile time rather than runtime.


I think it's perfectly fine to have option types, and that's exactly what languages without null references end up doing. What null references end up doing is accidentally turning all types into option types and making it impossible to have non-option types.


> You need to indicate no-data-here in some fashion.

Well languages that have non-nullable types still allow you to do that - you just have to be explicit about it. In TypeScript (using --strict mode which disables nullables), youd need to define the type as union type and make use of the "null" type: `let a: string | null`. So you tell the compiler that either "null" is a valid value for a or a string. The compiler will assist you and catch possible bugs (such as dereferencing a without checking it against null first)


Yes, which is where types like `Optional` come in. If you make a language where null doesn't exist by default, but still provide a standard way of indicating non-presence, you get the advantage of compile-time correctness checking.

Also, the compiler can still optimize the `(hasValue, value)` tuple into a possibly-0 pointer when the type of the value is a pointer. (which by the way, is exactly what Rust does, among others)


They are called Nullable types in C# and must be declared with `?` after the type.

But, Nullable<T>.HasValue check is not forced and Nullable<T>.Value will throw a different exception instead if it is null (InvalidOperationException).


Well, Depends on which type you are referring to (Which is part of what pains me with nullable ref types, as nice as it is to have)

If it's a value type (T), ? will make it Nullable<T> and provide the behavior described.

Reference types however can always be null, and do not have a .HasValue as exampled above. However newer versions of C# let you declare nullable references on a compiler level, but rather than HasValue/Value you still have to do the null check and instead can bypass via the new deref operator (!)


Which is what I was talking about. I haven't had the opportunity to put it to use yet (converting an existing project is a big headache, it's something to do from the start) so I didn't remember the terms, only the ability.


No one thinks that non-existence can't be represented, and all but the most extreme proof languages have the possibility of runtime error.

Null is a mistake because it is (1) ubiquitously permitted in types and (2) non-composable.

(Point #2: This caused JavaScript to have "undefined" which is a second level of nonexistence)

The Maybe/Option pattern solves both these problems.

Nullable at least solves the first one.

When people criticize "null, the billion dollar mistake", they criticize the ubiquitous, non-composable form of null.

https://www.lucidchart.com/techblog/2015/08/31/the-worst-mis...


This comes up again and again in one form or the other, yet new languages still seem to be making the same mistake. Of all languages I've touched, Rust seems to be the only one that mostly circumvents this problem. Are there other good examples?


> Rust seems to be the only one that mostly circumvents this problem.

The Rust hype is getting ridiculous here. There are plenty of languages with non-nullable references as first-class, and optionals for the nullable case.

(...And I say this as a Rust fan myself, for what it's worth.)


Imho, Rust is an awkward language because it positions itself as a systems language but it makes low-level stuff more difficult (there's even a book teaching how to implement doubly linked lists in Rust [1]), hence prone to mistakes. At the same time, people are using Rust to build non-systems programs, where other languages would be more appropriate (e.g. those with garbage collectors). I don't think it is a good idea that Rust is promoted as the language that will rule them all; in my opinion, it is still a research language.

Linus Torvalds said the following about Rust [2]:

[What do you think of the projects currently underway to develop OS kernels in languages like Rust (touted for having built-in safeties that C does not)?]

> That's not a new phenomenon at all. We've had the system people who used Modula-2 or Ada, and I have to say Rust looks a lot better than either of those two disasters.

> I'm not convinced about Rust for an OS kernel (there's a lot more to system programming than the kernel, though), but at the same time there is no question that C has a lot of limitations.

[1] https://rust-unofficial.github.io/too-many-lists/

[2] https://www.infoworld.com/article/3109150/linux-at-25-linus-...


> there's even a book teaching how to implement doubly linked lists in Rust [1]

Doubly-linked lists are an awkward example because the "safety" of a doubly-linked list as a data structure involves fairly complex invariants that Rust can't even keep track of at this point, much less check independently. These things are exactly why the unsafe{} escape-hatch exists and is actively supported. But just looking at the amount of unsafe code in common Rust projects should suffice to figure out that this is not the common case, at all.

> At the same time, people are using Rust to build non-systems programs, where other languages would be more appropriate (e.g. those with garbage collectors).

Garbage collectors are good for one thing, and one thing only: keeping track of complex, spaghetti-like reference graphs where cycles, etc. can arise, perhaps even as a side effect of, say, implementing some concurrency-related pattern. Everything else is most likely better dealt with by a Rust-like system with optional support for reference counted data.

That's without even mentioning the other advantages that a Rust-like ownership system provides over a GC-only language. See e.g. https://llogiq.github.io/2020/01/10/rustvsgc.html this recent post for some nice examples.


> Doubly-linked lists are an awkward example because the "safety" of a doubly-linked list as a data structure involves fairly complex invariants that Rust can't even keep track of at this point.

Perhaps it's just me, but I'd like to assume that my language does not treat any algorithm found in a basic algorithms course (e.g. Sedgewick) as awkward.


Eh, I have no problem with the idea that data structures that work in C are awkward in different paradigms. Many data structures are awkward in functional programming. Lots of things are awkward in C that are easier in other languages.


How times have you used doubly liked lists outside a CS101 programming exercise? Even if you did, it’d usually be trivial to implement an array index version or just use unsafe. Basically it seems you give up almost nothing for memory safety.


For sure, you can avoid awkwardness by not statically verifying memory usage & invariants (c++, etc) or using a GC'd language. Rust's ownership and borrowing rules are limited, but simple enough for someone to internalize them quickly.

There's a pretty vast difference between human simple and computer simple. Rust requires that you prove memory safety, or use unsafe. That's a different problem than just informally ensuring invariants are met.

You could probably pull in more advanced type theory research for more nuanced ownership, but I'd bet the language would be harder to understand overall (Haskell disease).


I’m a rust fan, but garbage collection is about removing having to think about memory management from the developer almost entirely, not about performance.


I don't do the sort of programming recently where it matters, but when I read the debates on GC on HN, I think, why not a language where there is a GC, but it is "cooperatively scheduled" - you explicitly invoke it for a fixed amount of time. Wouldn't that be the best of both worlds?


Value and move semantics do the same thing and work in 90% of the cases a GC does. For the remainder there's ARC.


If you think rust eliminates the need to think about it in the way that python does, I don’t know what to tell you.


You might be reading too much into that comment.

Python allows you to do something with memory that Rust has made it a priority to be more concerned with, and that’s sharing.

C also has easily shared memory, much like Python. Point being that Rust wants to make sure that your references are safe to share, whereas Python wants you to share as much as possible and makes it safe by not allowing multiple threads to interact with it.

These are different trade offs, but Rust does allow you to forget about memory management in the same way Python does, but forces you to think how it’s being shared.

That’s the added cost over Python and the extra thought that goes into using the language.


No, having to think about value and move semantics is extra overhead you take on. It's better when the compiler can help you catch this, like in Rust, but it still forces you to structure your program a certain way and to constantly think about incidental details like ownership.


GCs are also good at compacting heaps.


Perhaps, but at pretty severe cost. Your heap must be structured in a way that the tracing routine can make sense of (and the consequences of this involve considerable waste and inefficiency in practice - lots and lots of gratuitous pointer chasing), and the compacting step itself involves a lot of traffic on the memory bus that trashes your caches and hogs precious memory bandwidth.

Forget it. Obligate GC is a terrible idea unless you really, really, really know what you're doing.


Rust is a force for good but I think Andrei Alexandrescu was right when he said Rust feels like it "skipped leg day" (in the sense that it has its party piece and not much else) - from the perspective of the arch metaprogrammer himself at least.

Rust is obviously good for safety but for everything else (to me at least) it seems unidiomatic and ugly, admittedlty I've never really sunk my teeth into it (I've read a fair amount into the theory behind the safety features but never done a proper project)


> ...Andrei Alexandrescu was right when he said Rust feels like it "skipped leg day" (in the sense that it has its party piece and not much else) - from the perspective of the arch metaprogrammer himself at least.

Mind if I ask what that means? It seems like an interesting observation, but there are a couple of bits of terminology I don’t understand, like "leg day" and "party piece".

Any clarification would be appreciated, thanks!


Perhaps this link would explain it more clearly: [1].

Leg-day is bodybuilding terminology, and refers to the day of the week when the bodybuilder is supposed to be training the leg muscles. According to the meme, nobody wants to train the legs because they show the least.

[1] https://www.jonathanturner.org/2016/01/rust-and-blub-paradox...


Thank you, that is very informative!

> nobody wants to train the legs because they show the least.

It reminds me of the story about how Google drops products because no one wants to maintain an existing product. That would not show "impact", not like launching a new product would, and impact is how you get raises and promotions.


Rust has procedural macros. What else do you need for metaprogramming?


>> unidiomatic and ugly

I am not sure I would take anyone seriously who thinks this is a valid point to make about a programming language.


That was my personal opinion, unrelated to the first paragraph.


This comment would be much improved with a list of those languages.

Kotlin and Swift come to mind, what are others?


If we're limiting ourselves only to new languages, then nulls are statically excluded not only by Kotlin and Apple’s imitation of it, Swift, but also by F#, Agda, Idris, Elm, and (sort of) Scala. But the zozbot didn't seem to be talking only about new languages, so Haskell, Miranda, Clean, ML, SML, Caml, Caml-Light, and OCaml are also fair game. (It wouldn't be hard to list another dozen in that vein.) Moreover I think you could sort of make a case for languages like Prolog and Aardappel where you don't have a static type system at all, much less one that could potentially rule out nils, but in which the consequences of an unexpected nil can be much less severe than in traditional imperative and functional languages like Java, Lua, Python, Clojure, Smalltalk, or Erlang, which more or less need to crash or viralize the nil in those cases.


Good list.

I've found the consequences of a nil type are less severe in dynamic languages, where all variables have the Any type, since nil is just one of the options one needs to account for.

Static languages where everything is nullable are reneging on the promise; you say something is a String but that just means Option<String>, and it saps a lot of the reasoning power which static typing should give.


Even in dynamic languages, the consequences can be pretty bad. For example, I've seen lots of Ruby bugs where things end up being unexpectedly `nil`, but I haven't seen as many Python bugs where things end up being unexpectedly `None`.

How does this happen? Well, in Ruby it's a lot more normal to just return nil. For example, consider the following code snippet:

  [][0]
In both languages you are trying to examine the zeroth element of an empty array (or list, as Python calls it). In Ruby this evaluates to nil. Python throws an IndexError. So in Python, if you have a bug where you address an array with an invalid index, it manifests as an error in how you're indexing the list. Ruby silently returns nil, and the only actual error backtraces you see are when you actually try to call a method on this nil later on, which might not be anywhere near where your program messed up the array indexing.


That seems (although I don't have experience with either language) straightforwardly correct treatment in Python and wrong in Ruby, and the problem seemingly should be attributed to [lack of] range checking, not nil/null.


Sure; in this case, range checking provides an alternative behavior to generating a null reference. But Ruby has other places where it generates null references more promiscuously than Python. Java does too. If you take every use case that could generate a null reference and instead behave differently in that situation you’ve eliminated null references, and Python has largely done so despite having a None type.


`let s:String` is not the same as `let s:String?` in Swift, at least. (Nor in TypeScript)


Right, that's why Swift is in the lists above. TypeScript would've been a good addition, I just didn't think of it.


Concur.


TypeScript/Flow


PHP 7


> Rust seems to be the only one that mostly circumvents this problem. Are there other good examples?

Swift, Kotlin, and of course older languages of a functional bend like MLs, Haskell, Idris, Scala, …

Some are also attempting to move away from nullable references (e.g. C#), though that is obviously a difficult task to perform without extremely severe disruptions.


Scala happily accepts null as it is the bottom type for AnyRef and needed for jvm compatibility. Kotlin has a compiler check that enforces it, Scala does not.



I really love(d) Scala for introducing me to the whole idea of Optionals.

I wish for the life of me I felt like I could approach Scala at a time when it wasn't going through huge flux (I have shitty luck). I spent a good amount of time pre-version 2.10 :( and then recently went to have a look but saw Dotty (version 3.0?) coming by the end of 2020 and I was like "well, FML, time to wait a few more years and try again."

Anyone have any tips for using the Scala ecosystem effectively these days? Should I just wait for 3.0? Is it going to be a long winding road of breaking changes until a "3.11" version?

Is there a good resource for what folks are using it for these days? It seems like all the projects I used to know are ghostly on Github (but that could also be the fact it has been quite a few years, heh). Or do most folks just pony-up and use plain ol' Java libraries while writing their application/business logic in Scala?


> Rust seems to be the only one that mostly circumvents this problem. Are there other good examples?

Rust is not the first one to have an Option type; it's a common feature of functional languages because they have ADTs ( https://en.wikipedia.org/wiki/Algebraic_data_type )


Functional programming languages have been doing it for ages. Most "newer" statically typed languages also have it (Swift, Kotlin, Rust) by default. And old languages had it bolted on (C# 8, Java 8, C++ 17).

I think at this point basically everyone has realized null by default is a terrible idea.


> And old languages had it bolted on (C# 8, Java 8, C++ 17).

C#: actually true, you can switch over to non-nullable reference types

Java 8: meeeh, it provides an Optional but all references are still nullable, including references to Optional. There are also @Nullable and @NotNull annotations but they're also meh, plus some checkers handle them oddly[0]

C++17: you can deref' an std::optional, it's completely legal, and it's an UB if the optional is empty. Despite its name, std::optional is not a type-safety feature, its goal is not to provide for "nullable references" (that's a pointer), it's to provide a stack-allocated smart pointer (rather than have to allocate with unique_ptr for instance).

[0] https://checkerframework.org/manual/#findbugs-nullable


Swift with optional and optional chaining. [1]

[1] https://docs.swift.org/swift-book/LanguageGuide/OptionalChai...


Haskell, notoriously. I believe it pioneered the ergonomics of the alternatives used elsewhere.


AFAIK Standard ML predates Haskell and it has an option type.


ML is even older than SML and has algebraic data types.


The reason they come up again and again is that it's hard to design an imperative language without them (try, assuming you want to provide generic user-defined data structures that allow for cycles).

As a result, calling them a "mistake" is reasonably dishonest, as it implies there was an obvious, better alternative.


Can you give some more details on what the design problem is here?

It seems to me that nullable references are isomorphic to having an option type with non-nullable references, but prevent accidental unchecked dereference. What are some of the difficulties that you'd expect to come up if you took an imperative language with nullable references and replaced them with options of non-nullable references?


I don't consider 'option' types to have interesting semantic differences with nullable types. YMMV.

But beyond that, the absence of nullable references (really, a valid default value for every type) is a problem for record/object/struct initialisation - you either have to provide all values at allocation time, or attempt to statically check that the object is fully initialised before any use - Java has rules to that effect for 'final' fields, and they are both broken and annoying (less broken rules would likely just be more annoying).


The difference is that you can't accidentally use an option as a pointer without checking it first, and when your APIs specify a non-nullable pointer you can rely on the callers to have checked for null.

When you're reading or writing a function that accepts a non-nullable reference, you never have to worry about whether the argument is null or not. It's easier to get right, constrains the scope of certain types of errors.

If you get things wrong, and unwrap the option without checking, you get an assert failure at that location, rather than potentially much later on when the pointer is used.

The whole point is that Option<&Foo> replaces nullable &Foo, so your record/object/struct member is Option<&Foo> and the default value for it is None. Option<&Foo> even has the same runtime representation for nullable &Foo, as Option<&Foo> uses NULL to represent None.

It's just a different way of representing nullable references, but with semantics that make it easier to track null-checked vs nullable references, impossible to accidentally get it wrong and derefence a nullable pointer you mistakenly assumed was already checked, and better errors when you do make mistakes.


While I totally agree with everything you’re saying, I think they are right about it being annoying to initialize structs/records when all fields must be defined upfront. For one, it becomes harder to incrementally build a record in generic way. And if you decide to make a bunch of fields optional, then that optionality is carried with it forever, long after it’s obvious that the data exists for that field. Those are legitimately annoying things to deal with.

To avoid that annoyance, you almost have to rethink the problem. You can’t do it the imperative way, at least not without all that pain. Instead, if you don’t yet have the data, you should simply assign the field with a function call or an expression which gets that data for you. In other words, the record initialization should be pushed to a higher level of the call graph. If you do that, then every record initialization is complete.

Other solutions are more language-specific. TypeScript has implicit structural typing, so incremental construction is pretty easy. You just can’t try to tell the compiler that it belongs to the type you’re constructing, unless it actually does include all the necessary data.

In OCaml, you can define constructor functions which take all the data as named parameters. Since function currying is part of the language, you can just partially apply that function to each new piece of data, as you incrementally accumulate it. Then you finally initialize the record when the function is fully applied.

Suffice it to say that there are plenty of solutions to this problem.


rust, f#, ocaml, latest version of c# has an option to sort of get rid of nulls, zig


Elm is a great JS replacement on the frontend.


Kotlin


I assume two reasons, efficiency and because an efficient implementation of mutable state would have the same problem.

Right now, a single sentinel value makes a pointer null or not null (0x0 is null, everything else is not null). This is exactly how you'd implement a stricter type, like "Maybe". Encoded as a 64-bit integer, "Nothing" would be represented as 0x00000000 and "Just foo" would be represented as 0xfoo. No object may be stored at the sentinel value, 0x00000000. Exactly the same as what we have now, and provides no assurances that 0xfoo is actually a valid object.

Meanwhile, Haskell which "doesn't have null" crashes for exactly the same reason your non-Haskell program crashes with a null pointer exception:

    f :: Num a => Maybe a -> Maybe a
    f (Just x) = Just (x + 41)
This blows up at runtime when you call f Nothing, because f Nothing is defined as "bottom", which crashes the program when evaluated.

It's exactly the same as langages with null pointers:

    func f(x *int) *int {
        result := *x + 41
        return &result
    }
And the solution is the same, your linter or whatever has to tell you "hey maybe you should implement the Nothing case" or "hey maybe you should check the null pointer".

Where I'm going with this is that you need to develop entirely new datatypes and have an even stricter type system than Haskell. Maybe Rust is doing this, but it's hard. We all know null is a problem, but calling null something else doesn't make the problems go away.


> It's exactly the same as langages with null pointers:

Four huge differences:

1. You don’t need to pass around ‘Maybe a’ everywhere. If null isn’t expected as a possible value (which usually it isn’t), you just pass around ‘a’, and when you do use ‘Maybe’ it actually means something.

2. The Haskell compiler can, and does (with -Wall), tell you that your pattern match is non-exhaustive. You don’t need a separate “linter or whatever”. This is possible because the needed information is present in the type system, and doesn’t need to be recovered with a complicated and incomplete static analysis pass.

3. If you do this anyway, the error is thrown at exactly the point where ‘Maybe a’ is pattern-matched, not at some random point several function calls later where your null has already been coerced into an ‘a’.

4. This program is defined to throw an error; it’s not undefined behavior like in C that could result in something weird and unpredictable happening later (or earlier!).

Also, Rust optimizes away the tag bit of ‘Option’ under common circumstances; for example, ‘None: Option<&T>’ (an optional reference to ‘T’) is represented internally as just a null pointer, which is safe because ‘&T’ cannot be null.


> You don’t need to pass around ‘Maybe a’ everywhere.

You don't need to pass pointers around everywhere. Languages with null still have value types that cannot be null.

> You don’t need a separate “linter or whatever”.

Optional compiler flags count as "whatever" to me.

> it’s not undefined behavior like in C that could result in something weird and unpredictable happening later (or earlier!)

C++ doesn't define this, but the OS does (and even has help from the CPU).

Anyway, my TL;DR is that it's easy to have a slow program that passes everything by value, or east to have a fast program that uses pointers or references. Removing the special case of null is meaningless, because you can still have a pointer to 0x1 which is just as bad as 0x0, probably. This goes back to my original answer to the question "why don't more languages get rid of null" which was "it's harder than it looks." I think I'm right about that. If it were easy, everyone would be doing it.


> Languages with null still have value types that cannot be null.

Not all languages.

> C++ doesn't define this, but the OS does (and even has help from the CPU).

That's not how it works anymore, because C / C++ front-ends interacting with the optimizers are yielding too "optimized" results. See the classic https://t.co/mGmNEQidBT


Oh boy... `-Wall` is "whatever". Please let me never look at code you have written...


That's not the same thing as a null pointer because Nothing isn't allowed in place of e.g. integers, strings, etc. like in Java. What you're doing is defining a non-total function. Haskell, per default, doesn't perform exhaustivity checks when pattern matching, but you can enable that via a compiler flag - then it won't let you compile your example. Ocaml, for example, does that by default.


This missed the point. The point of not that you can forget to check the null case. The point is that you can express that sometimes there's no null case.


The "no null" case in traditional languages is just "int" instead of "*int". All values inside an "int" are valid integers.

Certainly it's problematic to use the same language primitive to mean "a pointer" and "this might be empty", but it's what people use them for in every language that has pointers (that I've used anyway).


That conflates pass by value/by reference distinction with being optional.

This means you need magic values for "maybe int" with no help from the type system. And you can't express "there's definitely an int at that address".


"Making everything a reference: The Billion Dollar Mistake" is the talk I want to see


Can you elaborate? I can't remember the last time I thought "oh darn it why is this a reference", but I can think of a billion problems I've had with nulls in jvm languages


Cache incoherency, which will cost us more and more performance as CPUs will improve slower in the future.


You may be interested in http://canonical.org/~kragen/memory-models then. I don't think it's necessarily a mistake but it's definitely taken for granted far too much.


there are a few completely different ways to interpret this, can you explain?


in languages like c, rust or go, where you can put arbitrary data on the stack, it seems to me as if such issues are less common because you dont have to worry about initializing pointers and allocating memory unless you actually want to put something on the heap. Thus if you make everything a reference in your language its no wonder you run into issues like null-pointers more often


With stack allocation you then encounter problems with object lifetime. Rust solves this problem by binding references to scope, and Go solves this by invisibility changing an allocation to the heap (and uses ref-counting? I think?).

I wish C had a feature that would let you allocate something on the stack and then return to the parent stack frame without popping the stack-pointer - that would be handy for self-contained object-constructors.


Regardless of whether it's on the stack or heap the point still stands. If all your objects are randomly allocated then an array is just references to those objects and will start out null. If you're using value types then your array of objects will never be null (empty instead) and you will benefit from CPU caching the data.


Fortunately, objects in modern GC are almost guaranteed to be layed out sequentially in memory of they are allocated sequentially or if they are referenced sequentially and a GC pass has run, because of the way copying GCs work. The much bigger problem is the memory and CPU overhead of storing pointers and following them, though that should be mitigated somewhat by the prefetcher.


Go uses fully-general GC, not reference counting. Obligate reference counting is used in other languages such as Swift, probably with worse throughput than obligate tracing GC.


Plus also doesn’t the stack need to be small to fit into the CPU cache?


There is absolutely no requirement from the hardware that the stack be any particular size


On the Windows desktop, the default stack size is 1MB. In IIS-hosted applications the default stack size is reduced to 250KB due to the popularity of the now-outdated programming trope of "one thread per request (per-connection)". On x86 Linux the default stack size is 2MB - which seems generous.


If you start allocating multi-kilobyte objects in your stack frames, they are not going to fit into L1.


I am not thinking requirement by rather performance wise.


Isn't that just how return values work or did I woosh a joke?


Everything in Python is a reference, and there's no null pointer issues.


If you've never seen this message, you haven't been programming in Python for very long:

    AttributeError: 'NoneType' object has no attribute 'foo'
Not to mention UnboundLocalError and cases of AttributeError that stem from trying to use attributes before they've been initialized. Some of these have slightly better ergonomics than Java’s pernicious null initialization, for example by crashing your program earlier, but the upshot is that everything you do in Java that will crash with a NullPointerException will also still crash your program in Python.

Oh, I guess except for shadowing an outer-scope variable with a local that you never initialize. That just gives you the wrong answer in Python, because there exists no local without an initialization. But it's a pretty marginal case.


I've certainly had some "None" errors in Python.

I think the difference comes from dynamic vs static typing. In Python, you sort of get into the habit of "defensive" programming: checking inputs to your function, catching Nones, etc.

In java, you tend to rely more on the type system. If it typechecks/compiles, there's a good chance it's OK. That is, until you get a null value that's not handled.

That's the root issue I think: If null is an acceptable value per the type, then the same type system should force you to handle it. As do the type systems in ML languages for option types, for example.


The first line is why I'm not 100% convinced of the severity of this mistake compared to the alternatives. The problem fundamentally is the use of magic values/numbers to represent the concept of "no value". You don't need explicit language support to have that concept and the bugs it causes. I guess having that as an intrinsic concept in the language makes it more likely that people use it badly. On the other hand debuggers etc. also intrinsically understand this and segfaults due to null pointers are usually very easy to localize once you see them. On the other hand if a "bad programmer" introduced their own magic non-value in a supposedly safe language, debugging that becomes way more confusing.


No, that's not the "fundamental problem". The fundamental problem is a type system that lies. A "pointer to string" is not actually a pointer to a string, it's a pointer to a string or to nothing. If your api returns a pointer of the latter type, it should signal this by making the return type "maybe-pointer to string" (although it has the same memory representation as "pointer to string"). Then, if the user tries to dereference a maybe-pointer (that is, to use a maybe-pointer as a pointer), the type system can statically catch this and make it a simple type check failure compilation error. The user must first check if it's null through a function that casts a maybe-pointer to a pointer.

Nothing about this precludes the usage of sentinel values.


Everything in Python is an object. In Python, containers are objects that reference other objects.

https://docs.python.org/3/reference/datamodel.html

https://docs.python.org/2.0/ref/objects.html


How are name bindings different than references?

    >>> a=[2, 3, 1]
    >>> b=a
    >>> id(a)
    139731931982216
    >>> id(b)
    139731931982216
    >>> b
    [2, 3, 1]
    >>> b.sort()
    >>> del(b)
    >>> a
    [1, 2, 3]
    >>>


In Python, lists are container objects. Container objects reference other objects. In the first line object a references objects 2, 3 ,and 1 (and any other objects in a's object heritage).

2, 3 and 1 have id's. That's what "everything is an object" fleshes out to in Python. But they don't reference other objects because they are literals. The value of 2 is also its name.


Depends what you mean by reference. In Python they are indeed synonymous. C++ references are different in that the binding / object identity can be changed through the reference, eg when you pass a reference to a function.


C.A.R Hoare couldn't foresee consequences 55 years ago. That's a small mistake. We should blame language designers who didn't bother to handle the problem after it's been obvious.


Lot of mainstream languages nowadays support non-nullable types, i.e. TypeScript and C# (taken from F#).


This old chestnut again.

There is an inherent problem in designing processes and writing code to capture them: The notion of not-a-value.

There are great many ways to solve them. The most common ones are 'null' and 'Optional[T]'. Neither just makes the problem magically go away. If a process is designed (or a programmer writes it) thinking that 'ah, well, here, not-a-value cannot happen', but it can, then.. you have a bug.

Some language features might make it possible to help reduce how often it occurs, but eliminate it? I don't think so.

Imagine, for example, in an Optional based language, that you just map the optional to a lambda to execute on the optional, and the behaviour of the optional is to then simply silently do nothing if it's optional.none. That'd be a much harder to find bug than a nullpointer error. (errors with stack traces pointing at the problem are obviously vastly superior to mysterious do-nothing behaviour with no logs or traces of any sort!).

Some other creative solutions:

* [Pony](https://www.ponylang.io/) tries to be very careful about registering when an object is 'valid' and when it isn't, and when you write code, you have to say which state the objects you interact with can be in. This lets you avoid a lot of the issues... but pony is quite experimental.

* In java you can annotate any usage of a type with nullity info, and then compiler linter tools will simply tell you that you have failed to take into account a potential null value. You are then free to ignore these warnings if you're just writing test code, or know better. Avoids clogging up the works with optional, but as the java ecosystem shows, you can't just snap your fingers and make 30 years of massive community effort instantaneously instantly be festooned with 'might-not-hold-a-value' style information. At least the annotation style gives the hope of being backwards compatible (to be clear, optional, for java? Really bad idea).

* in ObjC, if you send a message to a null pointer, it silently does nothing, in contrast to virtually all other languages with null types where attempting to message a null ref causes an error or even a core dump.

* Just write better APIs. Have objects that represent blank state (empty strings, empty collections, perhaps dummy streams which provide no bytes / elements, etc). For example, in java: Java's map (a dictionary implementation) has the `.get(key)` method which returns the value associated with that key, and returns `null` if there is no such value. About 6 years ago another method was added in a backwards compatible fashion (so, all java map implementations got this automatically): `getOrDefault(key, defaultValue)`. This one returns the provided default value if key isn't in the map. You'd think optionals provide a general mechanism for this, but, in scala, you have both: There's `someMap get(key)` which returns an optional, so to get the 'give me a default value' behaviour, that'd be `someMap.get(key).getOrElse(defaultValue)`, but maps in scala also have the java shortcut: `someMap.getOrElse(key, defaultValue)`. Sufficient thought in your APIs mostly obviates the issues.

null is not a milion dollar mistake. It is a solution to an intrinsic problem with advantages and disadvantages over other solutions.


The goal of `Optional[T]` is not to "make the problem magically go away", in fact that is almost the opposite of the goal.

Optional[T] exists to make it very obvious when a value is nullable. Having non-nullable types as default, with Optional[T], allows a developer to model a system more accurately. This is helpful both to the compiler as well as anyone else who reads/maintains that code.

> Imagine, for example, in an Optional based language, that you just map the optional to a lambda to execute on the optional, and the behaviour of the optional is to then simply silently do nothing if it's optional.none. That'd be a much harder to find bug than a nullpointer error. (errors with stack traces pointing at the problem are obviously vastly superior to mysterious do-nothing behaviour with no logs or traces of any sort!).

This is just one of the things a developer could decide to do when faced with an optional which is none. It is up the language design to make it easy to express this behavior (or any other behavior they might choose) without hiding it.


Sure it's up to the language design, but in practice a `None` gets a similar treatment as an empty collection, usually effectively short-circuiting remaining calculations. As the parent poster pointed out, this might either be the behavior you want, or actually mask the error, depending on the situation. By this logic, optionals aren't better than null refs, just different. The same argumentation holds for exceptions vs optionals.


In my experience, languages with strict non-null guarantees (and optional types), do the exact opposite of "mask the error". If anything, they are sometimes faulted for being too verbose.

The idea is, by explicitly marking things which can be null (wrapping them in an Option[T], for example), you can be sure that everything else is not null. This alone relieves the developer of a large cognitive load.

Further, the language can provide syntax to make handling optional types obvious without being painful. Rust match statements are one example of this.

Can you provide a specific example of how using an optional type makes a potential "missing-thing" type of bug harder to see?


In practice, usually any operation on optionals with such short-circuiting behavior must be explicit. For example, for member access, instead of foo.bar, you get something like foo?.bar - and that ? right there tells you all you need to know.

Same thing with exceptions/error types. With exceptions, propagation is implicit, but with error types, you usually have to use some explicit proceed-or-propagate operator.


Not contradicting, just that there a slight benefit in reifying an issue as it opens for notation and operators to simplify it. Maybe monad or option chaining are more than making the thing obvious, it make them half disappear.


I remember tracking down the null silent message failure issue in the early 1990s on NextStep. Then again almost 2 decades later on the iPhone. Personally, I’m not a fan of silent failures.

IMHO, allowing for non-nullable variables is a huge improvement in language design. Adding boilerplate annotations is an ugly way to handle it. Optimize for the common case and make variables non-nullable by default.


In my experience, forbidding null refs usually only results in getting null objects instead, such as empty strings or empty collections. Thats not bad, but might not solve such a silent error message, you'll still end up with an empty message in the end. In order to solve it proper, I'm thinking you'd might want to go further and have more expressive type constraints, like Ada subranges.


The mistake is being nullable/optional by "default", that is with the least amount of effort for programmers using such a language. Or worse only ever nullable (like Java is except for its built-in scalars I think?).

There is obviously a need about having optional things, but this is not the common case, so this should not be the default and even less the only solution. And it should enforce handling the absent case.

"null" is a shortcut for talking about solution which does nothing of that (and is even UB in case of mistake in some languages). Billion Dollar Mistake is generously low; probably the cost is already Multi-Billion Dollar, and counting.


> There are great many ways to solve them. The most common ones are 'null' and 'Optional[T]'. Neither just makes the problem magically go away. If a process is designed (or a programmer writes it) thinking that 'ah, well, here, not-a-value cannot happen', but it can, then.. you have a bug.

> Some language features might make it possible to help reduce how often it occurs, but eliminate it? I don't think so.

On the contrary, you can 100% eliminate it by forcing null handling at compile time with your `Optional` type. Haskell and some other strongly-typed languages do this (but they call it Maybe).

The way to do this in a C-syntax-ish language would look something like this:

    Optional<int> increment(Optional<int> i) {
        // return i + 1; would throw an error at compile time,
        // because Optional doesn't implement the + operator

        // i.applyToValue would throw an error at compile time
        // if you didn't handle both possible cases
        return i.applyToValue(
            ifNull: (void) => { return new Optional<int>(null); },
            ifValue: (int i) => { return i + 1; }
        );
    }
This is syntactically a bit heavy, partly because I was a bit more verbose than a real implementation would need to be, for clarity, and partly because C-style syntax doesn't do this well. Languages that support this generally have some syntactic sugar to make it a bit more terse.

I've argued before on HN that the benefits of strong static typing are overstated, but this is a case where strong static types really do completely eliminate an entire category of errors. Given how common these errors are, not using stronger types in this situation for popular languages has absolutely been a billion dollar mistake.


The problem GP was raising is on the other half of Optional: when you do finally need an int.

Say, you want to do

    arr[*increment(maybeInt)] //error: can't dereference Optional<int>
Now, if you as a programmer don't think increment(maybeInt) can actually return None in your particular case, you will probably do the minimal work to convince the compiler to let it go, say

    matchOptional( 
       increment(maybeInt),
       someInt => {return arr[someInt]}, 
        () => { /*never happens*/ return 0; }) 
(using a slightly simpler notation that your version, since I'm on mobile)

Now, if you were wrong and you do get a Nothing, instead of seeing a nice stack trace, you have an absurd 0 running forward. You could improve this using an assert() but this is what managed languages already do with the NullPointerException&friends.

The more interesting thing to show is all the code that does not deal with optional and that is now magically free of any possibility of null errors. But the programmer is still responsible for correctly treating the moment they need to go from optional values to non-optional, and here Optional is essentially just more in-your-face than null (which is valuable, don't get me wrong).

The only example I know of where a language feature truly completely eliminates a category of errors is managed memory, which does not replace memory errors with any other more-or-less equivalent error case.

I personally very much doubt NULL is a significant source of errors in managed memory languages. It's not nothing, but they are some of the easiest bugs to track down.


If it really can't ever be null, then it should just be an int, not an Optional<int>. The entire reason that it is an Optional<int> is that it CAN be null.

In this hypothetical language, not initializing an int is a compiler error, assigning null to an int is a compiler error, etc. If it's an int it literally cannot be null.

What ends up happening in practice is that the null is handled close to where it's created, and the rest of the code passes around an int you know isn't null, because people don't feel like passing around an Optional<int> and being forced to check it everywhere.

Sure, you can intentionally write code that does the wrong thing in any language, but the "return 0;" would be a very obvious error in even cursory code review.


There are normal cases where this can happen. For example, a map should normally return an Optional<ValueType> when you try to retrieve a key's association. However, there may be special cases where you know that a key is present (maybe it is a constant map, maybe you just set the value of that key etc).

I do agree that these cases are much rarer than the cases where a value is either always there, or the cases where a value really can be missing. I was only pointing out that Optional doesn't eliminate 100% of null errors, just 99.9% of them.


> For example, a map should normally return an Optional<ValueType> when you try to retrieve a key's association. However, there may be special cases where you know that a key is present (maybe it is a constant map, maybe you just set the value of that key etc).

Both of these cases are still pretty big code smells.

1. Just don't use constant maps. Instead of doing this:

    const myConfig = new Map<string, int> {
        "height": 72,
        "weight": 160,
    };

    [...]

    var height = myConfig["height"].matchOptional(
        ifNull: () => 0,
        ifValue: i => i,
    );
...use a constant structure (with an anonymous type):

    const myConfig = struct {
        height: 72,
        weight: 160
    };

    var weight = myConfig.weight;
You can verify whether height or weight are null at compile time this way[1].

2. If you just set the value of the key, instead of doing this:

    dictionary[word] = getDefinition();
    let definition = dictionary[word].matchOptional(
        ifNull: () => "",
        ifValue: d => d,
    );

...do this:

    let definition = getDefinition();
    dictionary[word] = definition;
I'm aware that I'm playing fast and loose with the syntax of our pseudo-language, but note that avoiding the optional will is terser and simpler than using the optional and eating the null case--this is true in most cases in most strongly/statically-typed languages. Not only do you learn to lean on the type system in a strongly/statically-typed language, but if the syntax is well-designed, it makes it easier to lean on the type system than to not lean on the type system.

[1] You may say, but what if I'm loading from a file? The common pattern is to load a config from a file as a map, and then load it into a struct, setting defaults, like so:

    const defaultHeight = 72;
    const defaultWeight = 160;

    JsonObject configJson = json.loadFile("config.json");

    const config = struct {
        height: configJson["height"].matchOptional(
            ifNull: defaultHeight,
            ifValue: v => v.asInt(notInt: v => throw Exception("Invalid height \"{}\" in config.".format(v)))
        ),
        weight: configJson["weight"].matchOptional(
            ifNull: defaultWeight,
            ifValue: v => v.asInt(notInt: v => throw Exception("Invalid weight \"{}\" in config.".format(v)))
     };
You eventually hit cases with user data where you can't handle it (hence the throwing exceptions) but this pattern allows you to fail early, and with descriptive error messages.


This is a corner case, though, and one that is itself a code smell (i.e. in well-written code, it should be very rare). Having implicit null references, and implicit null checks on dereference, optimized for rare a corner case to the detriment of safety in typical code patterns, is a bad thing.

And it definitely is a significant source of errors in managed memory languages, from my experience in C# and Python. It can also be pretty tricky to track down, when the code producing the null happens to run long before the code dereferencing it.


> Having implicit null references, and implicit null checks on dereference, optimized for rare a corner case to the detriment of safety in typical code patterns, is a bad thing.

Yes, I completely agree. I was just trying to point out that Optional is not a 100% air-tight solution, I think the problem of handling missing values is just too general to actually have a 100% solution.

Still, the perfect shouldn't be the enemy of the good, and default non-nullable definitely helps in most cases.

> It can also be pretty tricky to track down, when the code producing the null happens to run long before the code dereferencing it.

Here I don't agree. If the code producing the null is the problem, then you would have the same problem with Optional. Optional helps when code consuming the null forgot to handle the null case. If you're writing C# or Python and you get an NPE, and null was actually a valid value, you don't need to track down the source of the null, you just need to handle the null case in the exact case where the exception occurred (and possibly further downstream).


> Here I don't agree. If the code producing the null is the problem, then you would have the same problem with Optional.

The crucial difference is that most reference-typed variables wouldn't be optionals in a language where references can't be null. So in practice you get rid of a lot of problems, because the type checker catches the use of null where it's simply not a valid input. In C# and Python, because every reference is potentially nullable, you have to aggressively check at every boundary where your contract is that it's not actually null. If you ever forget, and your caller passes null, then you end up with this "how did this get there?" problem.

Conversely, with optionals, you also have to handle the null case if you're at the boundary, because past the boundary you'd just use a non-optional type to propagate that value further. With implicit nulls, the boundary is entirely in your head - the language won't do anything to help you enforce it.


> Haskell and some other strongly-typed languages do this (but they call it Maybe).

Most call it either Option or Optional FWIW. `Maybe` is the term used by Haskell and its derivatives (like Idris or Elm).


> Imagine, for example, in an Optional based language, that you just map the optional to a lambda to execute on the optional, and the behaviour of the optional is to then simply silently do nothing if it's optional.none. That'd be a much harder to find bug than a nullpointer error.

It's worth noting that this is only possible if the operation you're mapping over the optional can have side-effects. Without side-effects, mapping over an optional always does nothing, in a way - all the difference is in the value returned.

Adding that constraint does make programming pretty painful, though.


I was expecting someone to mention the Crystal programming language.

In Crystal, types are non-nilable and null references will be caught at compile time.

https://crystal-lang.org/2013/07/13/null-pointer-exception.h...

I certainly recognize that many bugs in Ruby programs announce themselves as `NoMethodError: undefined method '' for nil:NilClass`. So to be able to catch that before releasing code is a very welcoming addition in my opinion.


InfoQ has some gems, but their video content presentation is terrible (tiny box, or full screen):

https://www.youtube.com/watch?v=YYkOWzrO3xg


No comment on the Null References, but I will say I love the time-index provided for the video. I wish every video had these!


Null termination is still easily much worse. At least the general case of null dereferences today (less so earlier) is a page fault.


Should every domain have a Nil element instead ?


What is required is an optional "No" element. Then you can say, I have a "No" Problem, and people will think you're joking and remain on their happy path.


No, obviously not. Every domain having a Nil element is exactly the problem null references have introduced (at least for the call by reference parts of the affected languages).


Null is a single nil for all, I meant having a null per domain would force people to think of what it means to have nothing in that field and handle it. Maybe I'm too naive.


> I meant having a null per domain would force people to think of what it means to have nothing in that field and handle it.

In what way would it change anything? The "billion dollars mistake" is that because "nothing" is part of every type, any value you get could really be missing, and you have to either hope for the best (and die like the rest) or program ridiculously defensively.

Having a magical sentinel per type would have the exact same issue, namely that "nothing" is part of the type itself, and so you can never be sure at compile time that you do have "something".

That's what opt-in nullability (whether through option types or language builtins or a hybrid) changes, by default if you're told you have an A it can only be a valid A, and if you're told you might have an A you must either check for it or use "missing-safe" operations.


This is one of Go's weirdest features. When you cast nil to an interface, you pay for extra storage so the runtime can do method dispatch on what type of object you don't have, even though every implementation is likely to panic immediately.


Sometimes you don't want to allow a value to be null at all, but with null references you can't represent that at the language level.


But for numbers, a zero is not considered null, because it was handled in the operators rules.


Numerical zero has nothing to do with the issue discussed here. What you are proposing is to add another “number” to types like int and float that results in the program crashing whenever you try to add it to another number.


It does! It's the numerical "null object", just like the empty string and empty collection.


> What you are proposing is to add another “number” to types like int and float that results in the program crashing whenever you try to add it to another number.

There's already division by zero and NaN to trip you up in IEEE754.


Or there's NaN which just results in NaN and doesn't equal itself. No need to crash.


I think it does


Null references are not a mistake, they make perfect sense. Letting nullable types be dereferenced directly is the mistake.

Null references are at the core of a great number of sensible datastructures, and they're a natural fit for conventional computers.


There are two separate concepts here that often gets conflated.

There's null reference in a sense of a special pointer value (usually all bits set to 0) that means "this doesn't point to anything". That's a useful low-level tool that allows for compact representation of many important data structure.

And then there's null reference in a sense of type systems. To be more specific, "null reference" here is really a shortening of "every reference in the type system is implicitly nullable". And that is the billion dollar mistake.

An explicitly nullable reference type that requires explicit check on dereference, or option types, that use null pointers under the hood, are obviously not the problem.


To put it a bit more compactly, why is Boolean logic "True, False, Null"


> "null reference" here is really a shortening of "every reference in the type system is implicitly nullable"

I don't think this is quite accurate, there are definitely cases where non-null pointers are required (e.g. dereferencing). It's more correct to say that the type system does not explicitly indicate whether a pointer might be null or not.


Just consider all pointers null until proven otherwise, shouldn't be that hard to do something like this in static analysis. Even if a reference is non-null, you still have to wonder if it's valid.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: