No this isn’t another rant about agile programming or test-driven development or test first programming. There’s a depressing phenomenon in some open source projects (including Jaxen and PHP) where a programmer goes off in a corner, gets a cool idea, writes it up, contributes it, has it checked in, ships it to millions of users; one of whom has the distinct pleasure of being the first to ever actually use this code.

I am getting really tired of discovering code that is broken by design; not merely buggy but partially to completely non-functional and unable to be fixed. The worst case I ever saw was in Jaxen where I once spent two days trying to write unit tests for a package that had been contributed years ago (without any tests of course) and banging my head against the wall trying to figure out how to reach that code. Only after this time, did careful analysis of code paths reveal that the code could never be reached, no matter what. It never could have been reached.

More recently I’m working with some libraries in PHP copied from libxml which itself copied from .NET. Since the developers were only copying from other APIs, I guess they didn’t think it might make sense to stop and try to write a little sample code that used their libraries and see if it worked. In any case, the libraries in both libxml and PHP, though more functional than Jaxen’s org.jaxen.dom.html package, are full of public but unreachable code. They contain constants for node types that will never be supplied by the parser. They are poorly documented because the designers just copied rather than creating from scratch. Half the questions I have end up with me referred back to the .NET API docs and hoping that the copy was a faithful one. Even the original programmers no longer remember exactly how it works.

Of course like Mariposan clones on Star Trek TNG, it gets worse the further away from the source material you get. The .NET APIs are a little broken. The libxml APIs that copied the .NET APIs are somewhat broken. The PHP APIs on top of libxml are a lot broken. They miss basic things that libxml can do like detecting the end of the document, or recognizing errors. (The specific mistake was converting a ternary int return type into a binary boolean type in the translation.)

However, it’s not the bad translation and adaptation that gets me. It’s that these flaws are really, really obvious. Anybody who sits down to write almost any code that uses these APIs is going to hit these problems almost immediately. Unit tests might not catch the design flaws. After all, unit tests only verify that the code is doing what the programmer says it does. They don’t verify that the programmer has chosen to make the code do the right thing.

We cannot rely on unit tests alone. We also need to actually use the API before committing to it and shipping it. Try writing some non-trivial programs, and see how the API fares. Only by actually using the API in anger can you tell where it works and where it fails. You can find out what’s right, what’s wrong, and what’s missing. If you pay attention, you’ll also find out what it includes that you don’t need and that you can therefore remove. (Smaller is better.)

When I designed XOM, I took my first pass at the API, and used it to reimplement every example from Processing XML with Java in XOM. I also implemented some additional XML specifications such as Canonical XML and XInclude on top of the core libraries but in separate packages to make sure the core libraries offered enough public access to build other pieces of infrastrucutre on top of them.

The results were surprising. Several methods I thought were essential didn’t actually get called anywhere. and I could take them out. I also found quite a few things I’d forgotten to do, that I obviously needed. Later I released the library to other users, who found more things I’d left out, as well as other flaws. Only after I was confident the library had been put to some real use did I declare it finished and freeze the API.

Of course, I knew that I would have to add more features in the future; but slow and steady wins the race. I don’t add anything to XOM unless someone actually asks for it and provides a clear use case. Even if the feature is obviously a good thing in itself, I can’t be sure it’s designed right unless I can see how the feature is used.

As I write this, I notice someone is gearing up to make the same damn mistake, but this time in Ruby. Laurent Sansonetti writes:

For the curious, here is the patch to bring the xmlTextReader API to the libxml-ruby project, that I used in my previous hack. The patch has been generated from CVS HEAD.

Everything valuable should be wrapped, but note that the test cases are not complete and that there is no documentation (RDoc comments) yet. As the libxml-ruby guys were interested in it I just sent it to them as well.

Laurent, if you don’t have test cases or documentation, it isn’t done! Please don’t submit it.

Contributors, please don’t send in patches that add new features and new API unless you’ve actually used them first. Project maintainers: don’t accept code contributions unless there’s a clear use case and evidence of actual use. Otherwise you’re not really maintaining, just being a pack rat. When in doubt, leave it out. It is better to leave out a feature in this release than commit to supporting a broken API for the indefinite future.