Two Criteria for Closures

I’m slowly coming around to the idea that first class functions in Java are probably a good idea. I’m still not convinced we need full-blown closures though. Possibly we do. Doug Lea has some interesting use cases I need to explore further.

Several things concern me though. One is syntax. I really, really want the syntax to be simple and have no surprises or significant learning curve. I want existing Java programmers to be able to read Java code that uses closures (or first class functions) and immediately be able to see what the code is doing without any prior training in the new syntax. I want the syntax to be that clear. If we can’t have that, I don’t want to do it. I want Java to still be a lingua franca.

Secondly I want closures to be really, really easy to write. No gotchas. A little training may be allowed here, but there shouldn’t be any special cases or surprises. Programmers’ first instincts of how to write something should be correct. In particular, I do not want to see any issues like “the closure works if the external variables are declared final, but fails if they’re not.” Or, “the closure works if the external variables are not modified, but stops compiling if a line is added that modifies one.” That is, implicit final is not a workaround. The only things that break the closure should be:

  1. Syntax error inside the closure itself
  2. Breakage in the closure syntax (i.e. leaving off a trailing curly brace or some such in the closure definition)
  3. Removing or renaming an external variable the closure depends on.

In other words, the only things that should cause a closure to fail to compile or run in the expected fashion should be exactly those things that cause today’s code to fail to compile or run. If you rename a local variable from x to y and forget to rewrite the statement foo(x);, then the compiler complains. This is a common bug we all know how to handle when it happens to us. I want to make sure that adding closures does not add any new failure types or patterns.

Those are my requirements for any proposal to introduce closures or first class functions into the Java language. I am prepared to reject any proposal that does not meet these two criteria. If no proposal can satisfy these two criteria, then we shouldn’t add closures or first class functions.

So what are the proposals on the table, and how well do they meet these criteria?

CICE

CICE has a number of extra rules:

“The construct is legal only if ClassOrInterfaceType represents a class or interface type with a single abstract method (a SAM type).” In other words, add a new method to an interface and the closure breaks. That’s not a problem because it’s already the case that adding a method to an interface breaks existing implementations. However, it’s a problem that we can only use closures with single-method interfaces. That’s exactly the sort of gotcha I want to avoid that will trip up people writing closures.

“Any visible local variable that is initialized or assigned exactly once in the enclosing scope, as well as any visible parameter to an enclosing method that is never otherwise assigned, is accessible but not assignable within the body of the CICE, whether or not it is explicitly qualified as final.” In other words, adding a line that reassigns a variable, either in the closure or outside of it, can unexpectedly break things. This is a big problem. This is exactly the sort of gotcha I want to avoid.

“Any local variable that is explicitly qualified as public is accessible, and also assignable (unless also qualified as final), within the body of the CICE. Formal parameters and for-loop variables may not be qualified as public.” OK. We have to explicitly mark the variables we want to use in the closure. This seems like a slightly weaker version of only being able to use final variables, though why everything isn’t public by default I’m not sure. Furthermore the prohibition on public formal parameters (I think this means method arguments) and for loop indices seems unnecessarily confusing and gotcha prone.

There’s a second gotcha with this. The word public is being overridden to means something quite different from its usual meaning. These variables are not visible from outside the class. I’m sure this is being done to avoid introducing a new keyword, but it’s still added complexity for the language.

“Access to any other local variable or parameter from an enclosing scope is illegal within the body of a CICE.” It’s just not going to be obvious to people why they can use some variables in closures and not others. I think I’m coming to the conclusion that implicit final is a very bad thing, and a strong negative for any closure proposal. (Actually I’m coming around to the idea that implicit anything is a very bad idea, or at the least a code smell that warrants a second look.)

“It is, in many cases, technically feasible to infer types from the formal parameter list and method body of a CICE. It is worth exploring the pros and cons of doing so.” I’m not sure exactly what’s being proposed here, but it sounds like more dangerous implicit code that will surprise and confuse. Java’s a strongly, statically typed language. Let’s keep it that way. Making one small part of it implicitly typed is just inconsistent and confusing.

BGGA

How does the BGGA proposal satisfy or fail these criteria? First I think it fails the syntax test. It is uglier than the CICE proposal, and I have the same reaction to it I first had to generics and Ruby: I don’t understand it. Despite ten+ years of Java experience I cannot just look at the code and figure out what it is doing. I’m sure I’d get used to it over time but what about people who are not full-time Java programmers? It is on the one hand too concise, preferring large amounts of punctuation over keywords. On the other hand it in effect introduces an infinite number of new keywords to the language. Reading code that uses this seems likely to be too complex.

What about gotchas? Here I think BGGA comes out on top. Once you decipher the syntax, there appear to be fewer special cases to worry about than with CICE. It is a more complete, cleaner proposal, though likely many gotchas won’t become apparent until I actually start to try writing code with this. A few things do bother me though.

First of all, I’m not sure I like that return returns from the enclosing scope rather than from the closure itself. That’s a surprise.

I also don’t like the implicit return at the end of the closure to get the value out, or that there’s no way to yield that value early. That’s a surprise too. It also makes the code harder to read. Either return should return from the closure, not the enclosing scope, or there needs to be a new yield keyword that returns from the closure.

“At runtime, if a break statement is executed that would transfer control out of a statement that is no longer executing, or is executing in another thread, the VM throws a new unchecked exception, UnmatchedNonlocalTransfer.” This is unnecessary. I think this is more like returning a value from a method when that value is ignored and not stored. This should not cause an exception. The same is true for continue and return. The closure should silently exit when it complete and the enclosing scope no longer exists. Anything else violates the principle of least surprise, and exceptions are always surprising. It’s not as if the programmer actually needs to be told that the enclosing scope no longer exists, any more than they need to be told that the variable they’re returning from a method will be ignored. That’s an issue for the calling code, not the called code.

Like CICE, BGGA closure conversions only work with single method interfaces. That’s a gotcha.

Bottom Line

Both major current proposals flunk my criteria. If I’m forced to choose between them today, I choose neither: do not add closures to the Java language.

However, some of the problems may be fixable. In particular removing implicit final from CICE eliminates a lot of my objections there. Removing public local variables eliminates a few more. Is what’s left still useful and worth doing? I’m not sure.

The BGGA proposal is trickier to judge. I’d really like to see a fully written out version of Neal Gafter’s talks on the subject. I’ve watched his video twice now, but it’s just not as easy to dig into video as text. Rather than writing the spec first, I think you need to write some code that uses the proposal. That’s the only way to see if the syntax makes sense.

7 Responses to “Two Criteria for Closures”

  1. John Cowan Says:

    I’m addressing several unrelated points in this comment, so I’ll number them.

    #1: When I first read about Java inner classes, the restriction that a (non-static) inner class can only refer to final variables in the enclosing scope outraged me. “Either do lexical scoping right or leave it out!” I thought, coming as I do from a Lisp background where lexical scoping works for both mutable and immutable variables, and the compiler figures out which are which and how to manage the implementation.

    Now I shrug and live with the restriction; the compiler complains when I break it, and I work around the problem. (The general workaround is to have a trivial “box class” that contains just one public instance variable, which is more or less what Lisp compilers do when they decide they have to.)

    #2: public means “accessible out of the obvious scope”. It’s meaningless to attach the normal out-of-class semantics to local variables anyhow, so attaching them here is not really a Big Deal. We already have multiple meanings of final without too much problem.

    #3: Strong static typing has nothing to do with whether your language has type inference. Java versions up to 1.4 didn’t do type inference; you had to write out every type name explicitly. Generics require the compiler to do a modest amount of type inference, and there are strongly statically typed languages where the compiler does all the type inference; in ML, you never need to mention an explicit type within a structure (the analogue of a class) except where you are doing ad-hoc overloading of operators, as with floating-point + vs. integer +. And ML is even more statically strongly typed than Java is, as it doesn’t have the null hole.

    #4: The non-local-transfer exception is really quite unavoidable. Remember that beside being functions, closures are first-class objects, so they can be preserved in global variables or inside other objects (indeed, this is one of their common uses). When a closure created within a loop does a break or continue, and the loop is already complete, there really is no non-surprising thing to do, but silently returning (with what value, anyhow?) is much more surprising than throwing an error.

  2. Slava Pestov Says:

    Closures were first invented and various implications understood in the 70’s. Statically typed languages have had closures since the 80’s. Many use-cases have been demonstrated. Pretty much any new high level language being designed today has closures. Is there anything left to discuss?

  3. lordpixel Says:

    Firstly, John Cowan is right about #4. I want an error message if I continue or break a loop that no longer exists. Silently ignoring a return value isn’t always an error. Code trying to break or continue when there’s no longer a loop in existence is an error, and I want to be informed.

    I believe the currentl proposal requires the compiler to refuse to compile code where it can tell that this will happen. Specifically, if closure’s type extends RestrictedClosure (e.g. because it is intended to handle asynchronous closures like the Runnable interface) I think you will not be able to compile any code featuring a break or continue that would cause a loop *outside* the closure itself to terminate.

    I’d be interested in seeing some cases where the compiler can’t tell this, then we’ll know how common this exception will be.

    Rusty, you say:
    “First of all, I’m not sure I like that return returns from the enclosing scope rather than from the closure itself. That’s a surprise.”

    Well, that’s a consequence one of Neal’s initial design goals. The idea was specifically to avoid a different kind of surprise 🙂 The idea is if you take code that is not inside a closure and move it into a closure, it should mean the same thing. He blogged about it here, Tennent’s Correspondence principal, which explains the reasoning better than I can:

    http://gafter.blogspot.com/2006/08/tennents-correspondence-principle-and.html

    That page also contains a proposal for a yield syntax ( ^value ) but they have since dropped that – I am not sure why.

    If you want to understand the proposal in more depth, I recommend reading Neal’s blog in chronological order. I’ve been following along and it has been enlightening.

  4. Hallvard Trætteberg Says:

    I’d like to comment on your syntax criteria: “I really, really want the syntax to be simple and have no surprises or significant learning curve.” Syntax goes hand in hand with semantics, it’s not that easy to say which is surprising or complex. Anonymous interface implementations (new ActionListener() { .., }) are often criticized for being ugly. However, my students have no problem reading them when they’re simple, e.g. just call a method of the enclosing class, although they really don’t understand the mechanism. When I begin passing them around they’re confused, not because the syntax is suprising, but because their usage is (at least to them). I think this will be true for closures too: simple use cases will be easy to read, complex ones will be complex. People who are already comfortable using them, will have no problems, while people who are learning them for the first time will have problems anyhow. Of course the syntax can do harm, but I think that once you understand why something needs to be expressed (like explicit typing), you accept the syntax, too.

    As to what is surprising I agree with John. I was surprised when I read about how break and continue could be used within closures (BGGA), as this is something I’m not used to from Lisp. After having understood their usage, I think I would be even more surprised if was allowed to use break and continue after the loop context had exited. It doesn’t have a meaning, it’s like a control flow analogy to a following a null pointer: it makes no sense and we should be made aware of the error.

  5. Stephen Colebourne Says:

    And hot off the press is the third closure proposal – First-class methods: Java-style closures.

    http://jroller.com/page/scolebourne?entry=first_class_methods_java_style

    FCM is located between CICE and BGGA, and we believe that it address most of your concerns listed above. We’d love to receive your feedback.

  6. Howard Lovatt Says:

    I have posted a comparison of C3S, FCM, CICE, and BGGA – see URL – that splits out the different concerns addressed by these proposals

  7. Alvaro Says:

    I’m totally agree with you. Don’t add closures to Java!!