Experimental Programming

One advantage of test-driven development I’ve rarely seen emphasized is that it enables experimental programming. This was recently brought home to me while I was working on XOM. Steve Loughran had requested the ability to use a NodeFactory when converting from a DOM document. It was a reasonable request so I set off to implement it.

My first iteration satisfied Steve’s use-case. It was actually a fairly major rework of the code, but the test suite assured me that it all worked and I hadn’t broken anything that used to work. However, Wolfgang Hoschek quickly found a real bug in the new functionality I had added; and things got a little sticky here.

To satisfy John Cowan, the DOMConverter uses some fancy non-recursive algorithms. This enables it to process arbitrarily deep documents without stack overflows. However the algorithm is far from obvious; and even though the code is well-written it’s hard to come back to it a couple of years later and figure out just how it works.

But this is where test-driven development really shines: I don’t have to understand the code to fix it. All I need is a test case for the new functionality and the bug. Once that test is passing, I’m done. Rather than trying to understand what the code is doing and exactly how the bug is triggered, I just start making changes and running the tests. The changes aren’t random; but they’re based more on intuition and guess work than on detailed analysis of the code paths. In this case, my first attempt to fix the problem was hitting a NullPointerException. I could tell from the stack trace that the exception was thrown by this line:

  parent.appendChild(children.get(i));

I was pretty sure the parent variable was the offending null object since children had already been used in the immediately preceding line. The question then becomes why is parent null?

The most likely candidate seemed to be this line:

  parent = parent.getParent()

I wasn’t absolutely sure of that, but it seemed likely. I didn’t check it with a debugger or a System.err.println() statement; but given what else was going on with the code that was the obvious place to look. Now how to fix it? Here I got really confused. There was no obvious fix. I was thinking deeply about the problem, and I thought I was going to have to rewrite the entire method from scratch with a totally new algorithm.

But then I stopped thinking for a minute. “What’s the simplest thing that could possibly work?” I asked. This was the simplest thing I could come up with:

if (parent.getParent() != null) parent = parent.getParent();

The traditional way of approaching this problem would have been to think carefully about the algorithm, and consider exactly how this change would affect that. What would the parent variable be when the parent was null, and so forth? I could have done that, and if I had done that I would have rapidly concluded that this fix wouldn’t work. Just looking at it, I really thought this would fail. But instead I went ahead and ran the test anyway.

Damned if the test didn’t pass!

I’m done. I saved myself hours of hard mental effort trying to understand this code. In fact, I don’t need to understand the code. I only need to understand what the code is supposed to do, and have test cases that prove it does it. This is a radical rethinking of how we program, but I think it’s essential for modern programs. XOM is small enough that one person could understand it, but many programs aren’t. Does anyone really know all the inner workings of Apache? or Mozilla? or MySQL? Maybe, but I doubt it; and I know no one really understands everything that’s going on inside Linux or Windows XP.

The only way we can have real confidence in our programs is by testing them. Practical programmers long ago gave up on the fantasy of proving programs correct mathematically. Increasingly I think we’re going to need to give up on the fantasy of even understanding many programs. We’ll understand how they work at the low scale of individual lines, and we’ll understand what they’re supposed to do, but the working of the whole program? Forget it. It’s not possible.

This sounds scary. This sounds like a radical idea; and to a computer scientist, it is. But that’s only because until relatively recently computer scientists have only dealt with very simple systems. In the rest of the sciences–physics, chemistry, economics, and so forth–this is how all real problems are handled. We identify the low level basic principles like Schrodinger’s equation or Newton’s Laws that define how the world works; but when we do actual engineering we use approximations to those laws and we experiment to find out which approximations work in which domains. Chemists don’t start with Schrodinger’s equation when trying to understand the properties of a complex molecule. They experiment with it. They poke at it and they prod it, and hit it with electricity, light, heat and a thousand other things, until they’re confident they know how it behaves. In many cases, chaos theory tells us we can’t even theoretically hope to solve the underlying equations. Experiment is all we’ve got.

Computer science has really avoided experiment of this nature. Programmers do experiments but they don’t trust the results, unless they can understand them and rationalize them. I think that’s going to have to change. The systems are getting too big. While there will always be simple programs and simple systems that can be understood in toto, the more interesting systems are too big. The only way to manage and understand them is empirically, by experiment and by test. The good news is that experiments do work. Test driven development works. It produces demonstrably more reliable, more robust, less buggy code. You don’t need to understand why or how the program works as long as the tests prove that it does.

26 Responses to “Experimental Programming”

  1. Oliver Mason Says:

    This is a very good point, but I fear that many people will not accept it, as it reduces the highly-skilled programmer to a random ‘hacker’ (in the original sense, not the ‘cracker’ one).

    If that’s how to fix bugs or even add functionality, why not put a monkey at the keyboard? Once you’ve got your tests written you just get the monkey to do things until all of them pass.

    I’m devil’s advocate here. And I would not even limit the procedure to simple systems: if you work on many different projects, even simple systems take effort to maintain. Something I’ve written a year ago might be in a different style of programming, after all we continuously learn new tricks and idioms. So understanding your own code, even if it is simple, does require some effort.

    I just wish I could summon the energy to write a full set of tests for all my legacy code…

  2. Jason von Nieda Says:

    While I agree that this idea is fine in theory I think that in practice I’d say “Not on my team!”

    If I’ve spent 12 hours writing an algorithm that is fast and efficient I don’t want someone who doesn’t understand the code to come along and make a change that causes it to be 500% slower and to use 10x the memory because they can’t be bothered to understand the code they are working on.

    This is the kind of thing that is often not tested for. Test cases normally check for a specific result but that result is seldom (in my experience) based in “less tangibles” such as speed and efficiency.

    Systems these days are indeed complex, which is why we now have automated testing, but that doesn’t excuse a professional from fully understanding what they are doing. Proper documentation and comments go a long way towards helping the next guy.

    If the algorithm is “far from obvious” in the first place, place comments at the top of the method describing the general theory and give the details as you go along. This has many benefits, not the least of which is keeping future programmers from relying on a crutch that could break.

  3. awwaiid Says:

    “..why not put a monkey at the keyboard?” (Oliver Mason)

    Or better yet, a Genetic Programming System! This actually might be a _good_ idea ๐Ÿ™‚

  4. Dan Lynn Says:

    Test driven development is great. However, making quick tweaks to a complex algorithm could introduce new edge cases where the algorithm could fail. Since you didn’t go to the effort to fully understand the consequences of your code changes, you are essentially leaving it up to your clients to expose these new edge cases in a production environment.

    Also, the profiling comment by Jason was a good point. Highly optimized algorithms like the one you described should have a unit test that includes profiling metrics. This way, a random tweak will be less likely to have a huge impact on performance.

  5. Michael Feathers Says:

    Tests minimize the chance that we will screw something up, but they don’t eliminate it. I find it better to see tests as an augmentor of understanding, not a substitute for it.

    Ideally, all code would be understandable, but, in reality, a lot of it isn’t. When we run our tests we learn something about our code. When we read and run our tests, we know a little bit more. When we read and run our tests *and* read our code, we not only know what we know, we know what we don’t know.

  6. dennis Says:

    Great plan, assuming your unit tests are perfect.

    Of course, unit tests are code, and if you can figure out how to make them perfect, then why not just make the other code perfect too and skip the tests….

    For us mere mortals, seems to me the main benefit of unit tests is that you have two independent code bases, which make bugs show up because you’ll seldom have the same error on both sides.

  7. Dusty Says:

    If your tests were sound and complete, in theory you could implement your program by testing random code until it passes all the tests. This would, of course, take a very long time, but my point is that if your tests aren’t good enough to verify that random code does exactly what you want it to, then it is dangerous to completely rely on them. Only programs that have been mathematically proven have sound and complete tests, so, chances are, your tests aren’t both sound and complete.

  8. MrPhil Says:

    Great post! I have experienced the same phenomenon and consider it an advantage for TDD. Using test to experiment with hunches about a bug is definitely a time saver.

    As to Jason von Nieda point, it sounds like to me there should be some performance tests then, but you make good points about commenting etc.

  9. Mukund Says:

    Yeah Test Driven Programming is a nice-to-have thing but then I fear it’ll lack speed.

    Tadeoff.

  10. Doctor Says:

    Allow me to join the chorus praising rigorous testing but giving thumbs-down to the generic approach of coding without understanding. Or, more specifically, it is fine to rely on tests in your search for a fix, but at the end you MAY NOT check in your fix without proving to yourself that you fixed the algorithm, not the test case.

    Let’s take the line of code and engage in a hypothetical discussion.

    A simpler way to shut the exception is deleting the parent = parent->getParent() line altogether. Just keep using the old parent.

    It might also be the correct fix: maybe going up the tree at this spot was a lapse when the original algorithm was implemented. I don’t understand the code, so I don’t know.

    Maybe the problem is that somewhere else in the code there is an unnecessary identical call so you reach root too soon.

    It’s also hypothetically possible that the test cases would not really care who exactly is the parent. So, maybe both fixes would pass. Or (and I have ample experience with this situation) the test case itself is buggy and enforces incorrect behavior. I don’t understand the code, so how would I know? In other words, you must have extreme faith in your test suite to trust your approach. And there you run into the fact that there’s not enough time in the universe to test every possible input.

    In your case, you must have had a simple mental image such as “I’m going up a tree, and when I hit the root I might as well stay at the root.” This is a rudimentary level of understanding the code, thinking in invariants.

    The end of the article should have been “when I found the fix, I asked myself why we would be at the root in this particular case, found a justifiable reason and submitted the sample as a new test case along with my fix”. Instead, you engaged in justifying coding without understanding because systems are too complex.

    Well, I agree on some level – sometimes debugging your code takes you for a wild chase into code you’ve never seen. But, assuming someone owns that code, it becomes their responsibility to at least help you out, or perhaps take over the bug. Otherwise, you end up covering the bugs in their code so bugs stay in forever. Eventually, bugs become unfixable because myriads of tests and workarounds start failing over a genuine fix. IMHO, this is bad code stewardship.

    Testing rules though, and I give big thumbs up to anything that allows me to converge on the fix in a constant number of steps.

  11. Ed Davies Says:

    I’m having a serious rethink about using XOM.

  12. Ed Davies Says:

    Sorry, but I’m not making this up. I’ve just found a null pointer in some C++ code I’m debugging – the first I’ve had for quite a while. Looking at the specific point of the exception I see that dereferencing that pointer is not ideal – the information I’m looking for is actually available locally. However, it looks like when I wrote the code a couple of years ago I *thought* the pointer shouldn’t be null.

    Let’s have a straw poll – should I just not do the dereference and use the more directly available information or should I go look to see what’s really going on?

    Additional information: this is software used in the design of instrument approach procedures for civil and military aircraft in a dozen or more countries around the world. This particular routine if very unlikely to cause an error if the user is paying any attention at all but…

  13. Elliotte Rusty Harold Says:

    There’s not enough information to judge. The most important question is whether the code base has a really good test suite or not. I think some of the people posting here don’t realize just how comprehensive XOM’s test suite is. It’s currently running at 98-99% code coverage; with a lot of redundancy. It incorporates several independent third party test suites by reference. That’s why I’m relatively confident that any change that passes the suite is de facto acceptable. Without knowing how good your tests suite is, I can’t judge how you should proceed.

    The second thing I don’t know is how complex that piece of code is. Can you actually understand what’s going on there? If so, great. But what do you do if you can’t figure it out? Are you going to recall the software and all data derived from it?

    Finally, I suggest that even if you think the fix is obvious and you do understand it, you write a test case for the problem anyway, just to make sure. It’s been my experience that many obvious problems that had obvious solutions were in fact fixed wrong when not tested. Furthermore, bugs often regressed later on if no test for the original bug was available and automated.

    Theory and experiment are not exclusive. You want both. However all too often computer scientists and programmers have relied solely on theory to the exclusion of actual experiment. In practice, I find experiment to be more reliable than theory. If you have a test, and the test passes, you know the bug is fixed. if you have a good test suite and all the other tests still pass, you know the fix hasn’t introduced other bugs to a very high degree of certainty. However, if you merely think you understand the problem, and you think you’ved fixed it; but you haven’t actually tested the fix, then maybe you’ve fixed and maybe you haven’t. If programmers could eyeball what works and what doesn’t, they wouldn’t put bugs in the code in the first place.

  14. Rob Says:

    Aren’t you supposed to be writing new tests as part of Test Driven Development? How do you write new tests if you don’t understand the code that you’re writing?

    It doesn’t matter what your methodology is, the better you understand your project, the better code you will write. That means more than just passing automated tests, it means finding the best solution for a particular problem.

  15. Elliotte Rusty Harold Says:

    All that’s needed to write tests is understanding what the code is supposed to do. You absolutely do not need to know how the code does it. On some teams there are people whose job title is tester, not programmer. They are responsible for testing the program, but not for writing it.

    In fact, I think a test is better precisely if it doesn’t have any special dependence on how the model code operates. This allows more flexible programs because you can refactor and experiment and try different algorithms and approaches without changing the tests. This is the whole point of data encapsulation. The class is a black box which is accessed solely through its public interface. The internal details are deliberately opaque, and thus can be changed as necessary to improve performance or other desirable characteristics without breaking client code.

  16. Dave Brosius Says:

    >> If your tests were sound and complete, in theory you could implement your program by testing random code until it passes all the tests.

    Yup. I run code checkers on all kinds of open source code all the time. By far the lousiest code out there is the code found in junit tests.

  17. Elliotte Rusty Harold Says:

    You have to take what static code analysis tools report for JUnit tests with a grain of salt. JUnit tests are not typical code and don’t follow the same rules. For instance, they often deliberately trigger error conditions to see what happens. For instance, this pattern is common in JUnit tests:

    testFailingOp() {
      try {
        doSomethingBad();
        fail("No exception thrown");
      }
      catch (Exception ex) {
        // Cool. the code worked as expected.
      }

    However most static code analysis tools will flag that as an empty catch block. I sometimes test the exception message just to avoid that.

    Here’s another one: JUnit test method names can be extremely long. For instance, one of XOM’s is testConvertSingleElementDocumentFromXOMToDOM. Some static code analysis tools will flag that as an exceptionally long method name, and normally they’d be right; but not here.

    They’re a lot of other areas where the usual rules are actively misleading when applied to JUnit tests.

    Finally keep in mind that test code, although public, is not meant to be reusable or to be invoked by anything other than a test runner. That also suggests it doesn’t follow the same rules as most code does and can get away with simpler rules because it’s simpler code. The unit nature of unit tests means the tests can be considered in isolation, and thus it’s not nearly as important to sand down all the rough edges as it is in code where each method will be just one cog in a well-oiled program.

  18. Adam Myatt Says:

    I think we can agree that there are definitely negatives to the paradigm of modifying code without understanding it fully. However I agree with Elliotte in that it is a benefit to not have to fully and completely understand the code to make a small simple change.

    I work exclusively with offshore developers in India. The India-based organization we work with sees frequent turnover of developers. Developers also are responsible for understanding the internal functions of numerous applications. Often when a new developer is brought on board we are in the middle of several small/simple, yet critical, tweaks to a system. Often we do not have the time to wait for the developer to get 100% complete understanding of some of the complex applications before making simple changes. TDD with very thorough suites of test cases combined with code profiling applications (Eclipse TPTP, etc.) and code coverage applications (Cobertura) allow developers to make changes to our code, run full test suites, send JUnit, code profiling, and code coverage reports to project managers here in the U.S.

    Through this process we have been able to keep code quality high, bugs low, and provide flexibility on projects that might otherwise have had to been delayed.

    Given this is not necessarily an average case, but does happen at least once or twice a year.

  19. Hannu Terava Says:

    I’ve been calling this method Attempt-Driven Development aka ADD (as mentioned here http://radio.javaranch.com/lasse/2005/05/09/1115665561552.html). However, I consider ADD as a bad practice as such. Even if I can fix something without understanding the code fully, I make sure that someone else from our team reviews the change. This kind of combination of ADD and code review works great.

  20. Elliotte Rusty Harold Says:

    Pascal Van Cauwenberghe describes a real world legacy system that he modified through experimental development; in his case sufficiently that he eventually did come to understand it.

    I’m afraid this is the reality we all too often have to deal with. Certainly you’d like to perfectly understand every part of your system, but what do you do when you don’t? Throw up your hands in the air and refuse to fix anything because you might break something else?

    Pascal’s story is particularly interesting because he did not start out with a solid test suite. He had to develop the test suite as he went.

  21. Hannu Terava Says:

    Good example. I agree that there are uses for the programming style you described.

  22. Harry Says:

    Elliotte, you missed a chance to publish this on April’s Fool day!

  23. The Cafes » Negative Experimental Programming Says:

    […] I’ve previously written about the benefits of experimental programming: fixing code until all tests pass without necessarily understanding at a theoretical level why they pass. This scares the bejeezus out of a lot of developers, but I’m convinced it’s going to be increasingly necessary as software grows more and more complex. The whole field of genetic algorithms is just an extreme example of this. Simulated annealing is another. However those techniques are primarily used by scientists who are accustomed to learning by experiment before they have a full and complete understanding of the underlying principles. Computer scientists are temperamentally closer to mathematicians, who almost never do experiments; and who certainly don’t trust the results of an experiment unless they can prove it theoretically. […]

  24. John Cowan Says:

    Apparently this is a known anti-pattern called programming by permutation.

  25. Elliotte Rusty Harold Says:

    Hmm, a random Wikipedia article with no sources. All I can say is that I disagree. The key is having a large, reliable test suite that covers the entire code base and all expected functionality. Source code control helps too. In experimental programming you prove empirically that code works and does what it’s supposed to do.

    There are even more extreme forms of programming by permutation than what I’ve described here, most notably genetic algorithms. Here I’m applying some human intelligence to the program. With a genetic algorithm I don’t even do that.

    You can also use permutation to see where the test suite is lacking and needs to be improved. That’s what Jester is all about.

    Code permutation and testing is an effective and powerful technique. It works, in practice and in the real world, unlike mathematical proofs of program correctness.

  26. Jeremy Gardiner Says:

    Oh no! No!! How do you know the tests are complete and correct? It’s likely you introduced some unexpected behaviour the tests didn’t catch because they weren’t looking for it. Please tell me you will never work on any safety critical code!