Negative Experimental Programming
I’ve previously written about the benefits of experimental programming: fixing code until all tests pass without necessarily understanding at a theoretical level why they pass. This scares the bejeezus out of a lot of developers, but I’m convinced it’s going to be increasingly necessary as software grows more and more complex. The whole field of genetic algorithms is just an extreme example of this. Simulated annealing is another. However those techniques are primarily used by scientists who are accustomed to learning by experiment before they have a full and complete understanding of the underlying principles. Computer scientists are temperamentally closer to mathematicians, who almost never do experiments; and who certainly don’t trust the results of an experiment unless they can prove it theoretically.
However, perhaps it’s possible to introduce experimental programming in a slightly less controversial form. While many programmers may be unwilling to accept it as positive evidence of program correctness, it can offer undeniable evidence of program failure. This was recently brought to mind by a bug I was trying to fix in Jaxen.
Dominic Krupp had reported the bug on the Jaxen user mailing list. I was skeptical at first, but he had an executable test case so I couldn’t really deny it. As Joel Spolsky writes,
Programmers have very well-honed senses of justice. Code either works, or it doesn’t. There’s no sense in arguing whether a bug exists, since you can test the code and find out. The world of programming is very just and very strictly ordered and a heck of a lot of people go into programming in the first place because they prefer to spend their time in a just, orderly place, a strict meritocracy where you can win any debate simply by being right.
I rewrote Krupp’s test case as a JUnit test and added it to our test suite. Though the bug was now obvious the fix was not, so the first step was to isolate the problem. I figured the problem would either be in the Jaxen core or in the JDOM specific code. To find out, I rewrote the test using DOM rather than JDOM. That test passed, which convinced me the bug was very likely in the JDOM-part of the code, and that’s where I should begin looking. Note that I still didn’t understand why the bug existed; but experiment gave me useful information anyway.
I then began single stepping through the test case with the Eclipse debugger. Jaxen’s code is pretty involved, and even a simple XPath expression can involve many loops and quite deep method hierarchies. Furthermore, many methods are called multiple times from multiple places so break points aren’t quite as useful. I either missed the bug or neglected to enter the method where the bug resided. I still didn’t understand how this bug could arise, or have a good idea about where to look to fix it, so I put the problem aside for a while.
Next, the original reporter proposed a fix. Now I’m in a quandary. How do I judge his patch when I don’t understand the bug? Furthermore, looking at the patch, it’s not obvious to me how it would fix the problem. But this is where experimental programming comes to the rescue again. I don’t need to understand it! All I have to do is plug it in and see if the tests pass. The first results were positive. It did indeed fix the bug he reported. The next results were not. It broke 21 other unit tests that were passing without it. Consequently I could reject the patch, still without actually understanding it. I don’t need to explain how it’s broken. I just need to prove that it is by experiment and test.
You can learn something, even from an incorrect patch. I wasn’t even looking at the method Krupp patched to fix the bug. His patch may have failed, but it gives me new insight into the problem. Perhaps a slight modification of his patch might still fix the problem without breaking 21 other things. I’ll have to experiment and see.