Imagine There’s No Null

A couple of weeks ago I spent a considerable amount of time chasing down bugs involving null in a large code base: null checks after a variable had already been dereferenced, nulls passed to methods that would immediately dereference them, equals() methods that didn’t check for null, and more. Using FindBugs, I identified literally hundreds of bugs involving null handling; and that got me thinking: Could we just eliminate null completely? Should we?

What follows is a thought experiment, not a serious proposal. Still it might be informative to think about it; and perhaps it will catch the eye of the designer of the next great language.

To explore this, let’s reverse perspective and think about primitive types for a bit. I’ve long advocated a completely object-based type system. The distinction between primitive and object types is a relic of days when 40 MHz was considered a fast CPU; and even then it reflected more the lack of experience with OO optimization than any real issue. Languages such as Eiffel perform quite well without separating out primitive types. But if we were to make int and double and boolean full object types, would it then be necessary to allow null to be assigned to these types as well:

int i = null;
double x = null;
boolean b = null;

I suppose we could just rule out assigning null to these types but this would be an unnatural distinction between primitive and object types, which is precisely what we’re trying to avoid. Still I’d hate to give up the default values for unassigned primitives, 0 for numbers, false for booleans. After all it would be really annoying to have to write a method to add two numbers like this:

public static double sum(double x, double y) {
  if (x == null) throw new NullPointerException();
  if (y == null) throw new NullPointerException();
  return x + y;
}

One of the nicest characteristics of primitive data types is precisely that you don’t have to worry about this. You know they’re always initialized.

So suppose we go the other way instead. Let’s allow or even require each class to define a default value object that will be used whenever a null variable of that type is dereferenced. For instance, for the String class the empty string is the obvious choice. Perhaps it could be defined by overloading the existing default keyword:

public final class String {
  default = "";

  // rest of String class here... 
}

Or for a ComplexNumber class, the default might be 0+0i:

public class ComplexNumber {

  private realPart;
  private imaginaryPart;

  default = new ComplexNumber(0, 0);

  // rest of class here... 
}

Then any dereferences would simply use the default value instead of throwing a NullPointerException.

public class ComplexValue {

  private ComplexNumber z;

  public void ComplexNumber increment(ComplexNumber delta) {
    z = z.add(delta);
  }

In the current regime, this throws a NullPointerException unless some other code has initialized z and delta is not null. But in this scheme it would simply adds 0+0i if either is null.

Sometimes, though, you might really want to forbid uninitialized values. Or perhaps there is no sensible default value. For instance, what would be a sensible default value for a java.net.Socket? To indicate that no default is available, simply omit the default declaration from the class. There are two incompatible ways we could interpret this: either this means a null dereference throws a NullPointerException (current behavior); or it means the compiler forbids any declaration of such a variable without an immediate assignment. Pick one. I’m not sure which of these makes more sense, though I think I prefer the latter. No accidental NullPointerExceptions is the goal.

Of course, sometimes three-valued logic is sensible. After all, very few things are really true or false. True, false, and unknown is a much more accurate division. Integers and floating point values can also benefit from an otherwise impossible value that can represent unknown, unset, or uncertain. To enable this, we can allow any value to be tested for equality with default like so:

if (x == default) {
  // Special handling for the default case. 
  // We could even throw an exception here, 
  // but that would only be by deliberate choice, 
  // not something that happens unexpectedly. 
}

This comparison would return true if and only if the variable x were set to the default object for its type. It would not return true if x merely had the same value. For example,

String s1; // assumes the default value
String s2 = ""; // explicitly set to the default value
if (s1 == default) {
  System.out.println("This branch is executed.");
}
if (s2 == default) {
  System.out.println("This branch is not executed.");
}

This proposal could even be added to the existing Java language. However backwards compatibility would only make it feasible for new types. Unless we can apply this to existing types like String, Integer, and Document, I don’t think it’s a strong enough idea to carry its own weight. However in a new language without any such constraints, it could dramatically increase program robustness by eliminating an entire class of common errors.

Cue John Lennon.

37 Responses to “Imagine There’s No Null”

  1. Richard Says:

    I know of the Cobra language that has implemented this idea. I also seem to remember that in an interview with Anders Hejlsberg, after the introduction of nullable types in C#, he said that if he could do the language over, he would have baked this kind of support into the language.

  2. Erik Says:

    See Scala’s option class for a nice solution to this problem.

  3. Martin Probst Says:

    Erik: but will this really make it better? I can very well imagine people being annoyed with doing the matching and simply casting to Some[Foo]. I think these are things we cannot prove in a programming language’s type system – the programmer may know that e.g. map.get(“foo”) will always return a value at this point. Now we have the option to either allow programming errors (a la NullPointerException), or to force the programmer to always explicitly declare and check.

    The first approach might create some problems, but the second approach sounds a lot like checked exceptions in Java. try { foo(); } catch (IOException e) { } is not far from (Some[Foo])map.get(“foo”);

  4. Sakuraba Says:

    Isnt null a sensible default? I d rather see my code throw an exception and show that something does not work/is not used as expected instead of continuing with a default value. Introducing another additional semantic by expanding “true, false, null” to “true, false, null, default” could make it harder in the long run, because one would not only check for null but also for default.

    I wish I would not have to write all of the if( myVariable != null){…} statements, but the proper way to do so would be to introduce Assertions that are enabled by default and that allow modeling interfaces as contracts.

  5. yachris Says:

    C.A.R. Hoare was there before you… but he failed 🙂

    See this summary of his talk, Null References: The Billion Dollar Mistake (ah rats, that’s just a summary) or the Slashdot discussion thereof.

    So it looks like you’re right, but as the comments above indicate, kicking out “null” would be difficult, in the programmer psychology department (much like making primitive types objects… that gets some very strong reactions whenever I’ve brought it up).

  6. Leo Horie Says:

    One way of abstracting it is to say that an object reference is really a collection with either zero or one value. I like how the jQuery’s API considers everything a collection – e.g. $(“#el”).hide() only runs if $(“#el”) != null (and incidentally the same API call will run n times for any number of elements n returned by the $() function)

    It would be nice if a language had functions that always assumed that operands are collections, and if it had a compiler that can automatically figure out when there can only be one element and it’s safe to to remove the internal loop overhead.

  7. Porter Says:

    I too have often wondered what would happen if we didn’t have null. I suppose one of the things that bothers me most is that I can do this:

    SomeObject someObject;

    And then when I attempt to access ‘someObject’, I of course get a null pointer exception. Heck – I can’t even determine the _type_. Which I find a little bogus – as I have declared it’s type, even if I have not yet initialized it. I understand the earlier reasons for allowing null values, and they largly revolved around memory management which is fine as far as it goes.

    However, you might recall that NULL in database nomenclature is something of a nod to “reality”, and largely breaks the “true” relational model. In the same way I suspect the use of null in most object oriented languages is a nod to the book-keeping required to manage objects. The flip side would be to return some form of UnitializedException – when methods or members of an object are accessed.

    In any case, it’s still represents the least bad compromise.

  8. Richy Says:

    I wrote about something very similar a while back.

    I didn’t really come to much of a conclusion, but I do like the idea of defaults. I also very much like the way scala handles this 🙂

  9. tehwalrus Says:

    peh, a bool with no null option? what decade is this?

    we need null values in order to store extra information. two value bools can store a bit. three value bools (null, false, true) store 1.585 bits; in particular they let you know whether this property of your class has never even been touched; allowing you to fork off down a different execusion path in that case. (you would otherwise have to store this in 2 bits by using two bools, param and param_is_initialised.

    I am all for letting the primitives become fully fledged object and letting the compiler deal with optimising algebra, after all if you want to bitfiddle you should be using C and if you want very specific bounds on numerical accuracy you should be using FORTRAN, either are only really of interest to very low level programmers and physicists, i.e. not to anyone using objects!

  10. Satyadeep Says:

    Take a look at the Maybe monad in Haskell. Maybe is a Monad, given a type a, represents either Just a or Nothing. So for example, if we want to check is a value exists, we can say:


    exists :: Maybe a -> Bool
    exists a
        case a of
          Just a  -> True
          Nothing -> False

    Here, if the value exists, then we return True, False otherwise. This is a neat way to handle the case where we want to say that the value might exist or not (which handles a lot of the cases Null is used for).

    Some references:
    http://en.wikibooks.org/wiki/Haskell/Hierarchical_libraries/Maybe
    http://lukeplant.me.uk/blog.php?id=1107301659

    Google for more 🙂

  11. fogus Says:

    The problem with null is that it doesn’t have any inherent meaning. That is, it can be used to mean any of:

    – Uninitialized variable
    – Error code
    – Existential
    – No applicable value

    Perhaps what we need is a finer grain facility for dealing with each of the above?

    -m

  12. Jake McArthur Says:

    You’re on the right track, but I’ll let you in on a secret: this has already been done in many languages (mostly functional)! And it’s excellent.

    Consider the Haskell data type, Maybe:

    data Maybe a = Nothing | Just a

    This is Haskell’s emulation of a nullable value. A value of Maybe a will not type check for a function that wants an a. You have to either pattern match it (something like a C switch statement) or use a special combinator, like . For example, these two snippets do the same thing, given that foo is a function taking a “pure” value, and x is a “nullable” value:

    case x of
    Nothing -> Nothing
    Just a -> foo a

    foo x

    And there is also a shortcut for the case that you want to do something different for Nothing:

    case x of
    Nothing -> y
    Just a -> foo a

    maybe y foo x

    And many more. This is but one of many very nice abstractions and compiler-checked properties that you can have in Haskell (and many other functional languages).

  13. James Says:
    public final class String {
      default = "";
    
      // rest of String class here...
    }
    

    Is this really the obvious option though? What about when you want to differentiate between the empty string and no string whatsoever? That’s why null exists.

    Likewise for integers — 0 is a horrible choice. 0 is perfectly meaningful.

  14. Jake McArthur Says:

    The HTML killed some stuff. There is supposed to be an operator <$> in there.

    “or use a special combinator, like <$>.”

    and:

    case x of
    Nothing -> Nothing
    Just a -> foo a

    foo <$> x

    Hopefully it worked right, this time.

  15. Erik Says:

    Martin – I think it does help. It works for me, anyhow.

    You could be right to compare it with checked exceptions, but it is almost as easy to write the code to deal with a None as it is to simply call get, so I think it encourages good practice. Whereas it is hard to deal with an exception. (I wish that more Java programmers declared their methods to throw exceptions, since I would rather have an IOException bubble up to application code that knows how to deal with a problem than be caught & “dealt with” down below in a method that really has no idea what to do with it.)

  16. Porter Says:

    +1 for Fogus

    Typically when writing methods which have a return type, I do everything in my power to return an initialized object of that type… Say a method which will return a list of objects mapped from a rows in a database query.

    public List getAll(DataSource ds) {

    }

    I avoid returning a null, and instead return the empty list.

    I do this because semantically it’s more complete, and it retains meaning… Callers don’t need to do a “if (result != null)” check before working with a List object – they can instead do (if needed) “if (result.size > 0)” – or better still “while (result.hasNext())”

    Returning null, unless there is a compelling reason, has you litter your code with the if foo != null type guard conditions _everywhere_. There may be solid performance reasons to do so – but I prefer to let the code execution time tell me that; make it work right first – then make it work fast.

  17. Chris Burdess Says:

    I’m not sure if this is what Sakuraba means, but saying that the value of dereferencing null is null (rather than raising an NPE) seems like a pretty elegant solution to NPE misery. Automatically converting to some default non-null value will most probably change the semantics of the operation considerably.

    While we’re at it, could we maybe have a nice rule about the truth values of non-boolean expressions? (i.e. anything that is not 0, and empty list, an empty string, etc is true, 0 is false, etc) Could save some several reams of unnecessary comparisons…

  18. Aristotle Pagaltzis Says:

    This is what Perl does: any scalar can be undef; if you try to do math with undef it acts like 0, if you try to concatenate it to something it acts like the empty string.

  19. Helltime for May 29 « I Built His Cage Says:

    […] Rusty Harold has a post proposing an overloaded default keyword as built-in language support for the Null Object Pattern. I don’t think you have to go that far. Simply remove the null […]

  20. Antonio Says:

    I do love null’s.

    Null values are a simple way for the JVM to tell you that you haven’t initialized a variable. That you have a bug somewhere.

    Having default values does not tell you that you have a bug somewhere, and *that* is bad.

  21. Dumbo Says:

    @Aristotle:

    UNDEF scalars dont exactly act like 0/empty string.

    With “use warnings” they will print a warning to your screen
    (and I hope that really NOONE is writing perl scripts without this statement).

    Anyway; and as all my scripts die on warnings there is a major difference
    between UNDEF and defined scalar values. But maybe thats just me, as I want
    to know what my scripts are doing.

  22. C. Doley Says:

    I view nulls as a colossal mistake that have become so second nature that they seem like a requirement. And they’re inconsistent with the real world.

    For example, I have not once in my life seen a situation where the business logic distinguishes between a null string and an empty string. I’ve never seen a situation where business distinguishes between null collections and empty collections. The fact that they exist in the language allows programmers to conveniently use them in place of completely specifying their objects, but I’d argue that this is a crutch and removing them would force better design.

    I like your idea. I’d go further than you and ban exposing these default values altogether. If you really need to distinguish between set and unset values, use a separate flag. Slightly more work for the programmer in exchange for clarity. People should not be passing uninitialized values around except in fairly limited contexts anyway.

    In a relational database, nulls are an even bigger problem. There’s so much literature on this that I won’t go into it but I think a major reason they’re used so much is that languages allow you to map the logically inconsistent notions of NULL references to the database notion of ternary logic NULLs. Ban them in both places.

  23. Bobby Woolf Says:

    This is the Null Object pattern: http://en.wikipedia.org/wiki/Null_Object_pattern.

  24. dstibbe Says:

    Using a default value for things such as int is a terrible idea. Which value are you going to pick ? You will have to pick a number that you are certain that you will never use. And if you have done so, how would identifying that the ‘default value case’ is true be better then identifying that the ‘null value case’ is true?

    null can mean a lot (as was pointed out earlier), but it always means at least one thing: that the object/variable isn’t set.

    If a NPE is thrown, this is commonly due to a programming error. Be glad that it shows when it does. Imagine if this programming error was ‘obscured’ by the use of a default variable, thus leaving the code working with incorrect data.

  25. John Says:

    For fun, if we were considering the idea for Java:
    1. Going with immediate-assigns only, what would one do about class members where there is no sensible default? One way that I could see is to force each class member to be assigned via a constructor/initializer block, even if it will be set-again elsewhere (e.g., via a setter).

    2. Garbage collection: Things would tend to linger around longer. I couldn’t free up something for GC until the method terminates or until I assign a new instance to the same variable.

    3. If we adhere to interfaces, one could indeed define a do-nothing default for things like sockets. If there is an enforced “Defaultable” interface (and skipping over a host of other details), then we can even use instanceof to see if the instance is the “default” value. It would save on some code, but I could see it introducing a host of more subtle errors.

  26. Daniel Serodio Says:

    The Nice programming language handles nulls perfectly, IMHO. Everything is not-null by default, and if you want a “nullable” reference, you postfix the type with a question mark.

    String foo = “bar”; // can’t be null, so needs to be initialized
    String? nah; // may be null
    nah.hashcode(); // won’t compile, nah may be null

    if (nah null) { nah.hashcode(); // ok, now the compiler knows it’s not null}

  27. Asbjørn Ulsberg Says:

    While I agree NullPointerExceptions are annoying (in any language), I don’t think replacing one default value (null) with another (e.g. empty string). Not because another default would make it simpler to program (it would!), but it would also lead to another set of (perhaps much harder to discover) problems.

    The underlying problem with NullPointerException isn’t that the variable ought to have a differet default value that we can use, but that the variable probably isn’t initialized due to a bug. Our code expects the variable to have a value; that’s why NullPointerException is thrown in the first place. If null is replaced by “” or whatever sensible default given the type we’re operating on, it might open up a whole can of worms.

    As you write yourself, you need to check for equality to “default” — how’s that really different to checking equality to “null”? When that “if” statement is required either way, I think it’s better for the code to help you and throw a NullPointerException when you forget than just silently executing code that doesn’t do what it should.

  28. Hamlet D'Arcy Says:

    As a user of a language with an Option type, I can say that it is a wonderful way to eliminate null from the language. The fact that Scala allows you to cast an Option type into an instance of a Some type should not be held against Option types in general. This is a problem with Scala’s implementation and not the Option type idea. AFAIK, F# does not allow you to do this so I assume OCaml doesn’t either.

    Having played with the JSR308 prototype (which includes the @NotNull/@Nullable annotations), I can say that I am deeply impressed with the work done by that group and excited to start using the annotations. It makes this discussion moot: If you don’t want nulls then don’t use them. And the 308 type checkers will make sure nobody does!

  29. C. Doley Says:

    @James: “What about when you want to differentiate between the empty string and no string whatsoever? That’s why null exists.”

    The point is, in my 15+ years of programming, I have not once seen a case where such logic was necessary. Dozens of companies, hundreds of programs, and literally never in my life have I seen an application that differentiates between null and the empty string. (More precisely, I’ve seen hundreds of places where this is done, but in every case it has been a bug.)

    As programmers we get too used to thinking about things in technical terms that have no meaning in the real world. Ask any non-programmer when they want a piece of data to be empty and when they want it to be null and you’ll get a blank stare. There is just no need for this distinction.

    We tend to get bogged down in false notions of flexibility. For example, “since having null strings allows me more flexibility than not having it, we should keep it.” But this is misleading and pointless. Having a non-nullable string + a boolean is even more flexible than a nullable string, and MUCH less error-prone.

  30. dstibbe Says:

    “Having a non-nullable string + a boolean is even more flexible than a nullable string, and MUCH less error-prone.”

    how is “non-nullable string + boolean” more flexible than a “nullable string + null!=string”?

  31. Toby Says:

    “it could dramatically increase program robustness by eliminating an entire class of common errors.”

    Or rather, “could dramatically DECREASE program robustness by HIDING an entire class of common errors.”

  32. C. Doley Says:

    how is “non-nullable string + boolean” more flexible than a “nullable string + null!=string”?

    Simply because it has more possible values (i.e. num_possible_strings * 2 instead of num_possible_strings + 1). Granted that’s not necessarily super useful, but my point is that it’s at least as flexible.

  33. dstibbe Says:

    I do not agree. It does not give you any increase in possible values. There is still just an equal amount of strings possible. Only difference is that in your case there is no NULL pointer but instead there is a boolean that tracks whether there should be a string or not (which is what the NULL pointer basically is).

    Regarding:
    “Dozens of companies, hundreds of programs, and literally never in my life have I seen an application that differentiates between null and the empty string. ”
    Simple case: Determine whether a string is given? If it is null, then it is not given, otherwise is it is, whether it is empty or not.
    Can’t see what’s buggy about that.

  34. C. Doley Says:

    That’s not a business case at all. That’s a hypothetical example that illustrates my point that programmers tend to make distinctions that have no bearing on the real world. What you describe might not be a “bug” in the technical sense, but it sure sounds like a buggy business process to me. How many of your users distinguish between “not filling out a field” and “filling out a field with a blank value”. Answer: 0%

    I concede that it may be possible to construct an example where there are multiple different states of a field not being filled in, for example if a user never saw it versus saw it and left it blank. But I’ll argue that trying to distinguish among these states by overloading a single field with values such as “”, ” “, and null is a bad implementation choice. Better to have a separate flag indicating the difference.

  35. dstibbe Says:

    First of all, I wasn’t talking about users.
    Second of all, there was nothing hypothetical about my example.

    Perhaps you misunderstood because I used a String object as example? Take then, for example, a method that will create and return an object only under certain circumstances and otherwise null. If you still think that would be ‘hypothetical’ I’d have to wonder…

    We agree that overloading is terrible. However, you have not given one reason to support that your ‘seperate flag’ is better than (or any different from) a null value for an object.

    How would a flag indicating that an object was instantiated or not be any better or different than the null value that says the same thing?

    – both need to be set
    – both need to be checked

  36. Adelle DeWitt Says:

    Elliotte: you really blew this one. Nulls are great in safe languages like Java PRECISELY because they cause NullPointerExceptions. (They are only bad in barbaric languages like C/C++ because they can corrupt memory.)

    To see this, you need to realize that you are confusing symtoms with causes. A NullPointerException is a symptom. Failing to provide proper variable initialization is the cause. Your proposal eliminates the symptom and allows the cause to go undetected.

    But silent errors are the most insidious errors of all! Failing to provide the right initialization of a variable, but letting the code proceed without crashing, is one such example. It is much better to crash, since then you know that there is a problem. You get that wonderful crashing behavior with Java objects, but not with primitives, which is unfortunate.

    I see a contradiction between your calls for draconian error handling (which I heartily agree with) and your proposal here.

    The only thing that Java needs reform of in default behavior is that there should be a convenient way to initialize primitive arrays.

  37. Jon Says:

    I like where you’re going with this. Null checks are annoying, and often unnecessary if you look at the actual path of the code. For example, one object could go through 7 layers of code and have the same null-check every step of the way.

    Like many of the other posters, however, I do still see the need in having nulls. And I feel that the default route is a bad way to go since it makes a valid value indistinguishable from an invalid value.

    I have my own idea, actually, on how to deal with this:
    1) Don’t allow uninitialized primitives. I personally don’t see how having and using a default value for these is good coding practice anyways, since it makes it unclear what the actual value is.
    2) What if each Object had a “primitive” version? By convention, all Objects start with a capital, so what if a “primitive” version of an object was the same name, just starting with a lowercase letter. (It could technically be a child class of every Object).

    The whole purpose would be to tell Java that this object is definitely NOT null. A method that wants to use an object and doesn’t want to have to check for nulls, could just take the “primitive” version as the input type. Then the compiler could enforce not passing in an object that could be null into a method that takes its “primitive” version.

    For example, this:
    public BigDecimal sum(BigDecimal a, BigDecimal b) {
    if (a == null || b == null) {
    return null; //or do whatever you think is best
    }
    return a.add(b);
    }

    Could turn into this:
    public bigDecimal sum(bigDecimal a, bigDecimal b) {
    return a.add(b);
    }