Harold’s Corollary to Knuth’s Law

Tuesday, August 5th, 2008

Lately I’ve found myself arguing about the proper design of unit tests. On my side I’m claiming:

  1. Unit tests should only touch the public API.
  2. Code coverage should be as near 100% as possible.
  3. It’s better to test the real thing than mock objects.

The goal is to make sure that the tests are as close to actual usage as possible. This means that problems are more likely to be detected and false positives are less likely. Furthermore, the discipline of testing through the public API when attempting to achieve 100% code coverage tends to reveal a lot about how the code really works. It routinely highlights dead code that can be eliminated. It reveal paths of optimization. It teaches me things about my own code I didn’t know. It shows patterns in the entire system that makes up my product.

By contrast some programmers advocate that tests should be method-limited. Each test should call the method as directly as possible, perhaps even making it public or non-private and violating encapsulation to enable this. Any external resources that are necessary to run the method such as databases or web servers should be mocked out. At the extreme, even other classes a test touches should be replaced by mock implementations.

This approach may sometimes let the tests be written faster; but not always. There’s a non-trivial cost to designing mock objects to replace the real thing; and sometimes that takes longer. This approach will still tend to find most bugs in the method being tested. However it stops there. It will not find code in the method that should be eliminated because it’s unreachable from the public API. Thus code tested with this approach is likely to be larger, more complex, and slower since it has to handle conditions that can’t happen through the public API. More importantly, such a test starts and stops with that one method. It reveals nothing about the interaction of the different parts of the system. It teaches nothing about how the code really operates in the more complex environment of the full system. It misses bugs that can emerge out of the mixture of multiple different methods and classes even when each method is behaving correctly in isolation according to its spec. That is, it often fails to find flaws in the specifications of the individual methods. Why then are so many programmers so adamant about breaking access protection and every other rule of good design as soon as they start testing?

Would you believe performance?


Wednesday, June 18th, 2008

Here’s part 11 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

This part’s a little funny because really it deserves an entire book on its own, and that book has yet to be written. I didn’t have space or time to write a complete second book about test driven development of web sites and web applications, but perhaps this small piece will inspire someone else to do it. If not, maybe I’ll get to it one of these days. 🙂

In theory, refactoring should not break anything that isn’t already broken. In practice, it isn’t always so reliable. To some extent, the catalog later in this book shows you what changes you can safely make. However, both people and tools do make mistakes; and it’s always possible that refactoring will introduce new bugs. Thus, the refactoring process really needs a good automated test suite. After every refactoring, you’d like to be able to press a button and see at a glance whether anything broke.

Although test-driven development has been a massive success among traditional programmers, it is not yet so common among web developers, especially those working on the front end. In fact, any automated testing of web sites is probably the exception rather than the rule, especially when it comes to HTML. It is time for that to change. It is time for web developers to start to write and run test suites and to use test-driven development.

The basic test-driven development approach is as follows:

  1. Write a test for a feature.
  2. Code the simplest thing that can possibly work.
  3. Run all tests.
  4. If tests passed, goto 1.
  5. Else, goto 2.

For refactoring purposes, it is very important that this process be as automatic as possible. In particular:

  • The test suite should not require any complicated setup. Ideally, you should be able to run it with the click of a button. You don’t want developers to skip running tests because they’re too hard to run.
  • The tests should be fast enough that they can be run frequently; ideally, they should take 90 seconds or less to run. You don’t want developers to skip running tests because they take too long.
  • The result must be pass or fail, and it should be blindingly obvious which it is. If the result is fail, the failed tests should generate more output explaining what failed. However, passing tests should generate no output at all, except perhaps for a message such as “All tests passed”. In particular, you want to avoid the common problem in which one or two failing tests get lost in a sea of output from passing tests.

Writing tests for web applications is harder than writing tests for classic applications. Part of this is because the tools for web application testing aren’t as mature as the tools for traditional application testing. Part of this is because any test that involves looking at something and figuring out whether it looks right is hard for a computer. (It’s easy for a person, but the goal is to remove people from the loop.) Thus, you may not achieve the perfect coverage you can in a Java or .NET application. Nonetheless, some testing is better than none, and you can in fact test quite a lot.

One thing you will discover is that refactoring your code to web standards such as XHTML is going to make testing a lot easier. Going forward, it is much easier to write tests for well-formed and valid XHTML pages than for malformed ones. This is because it is much easier to write code that consumes well-formed pages than malformed ones. It is much easier to see what the browser sees, because all browsers see the same thing in well-formed pages and different things in malformed ones. Thus, one benefit of refactoring is improving testability and making test-driven development possible in the first place. Indeed, with a lot of web sites that don’t already have tests, you may need to refactor them enough to make testing possible before moving forward.

You can use many tools to test web pages, ranging from decent to horrible and free to very expensive. Some of these are designed for programmers, some for web developers, and some for business domain experts. They include:

  • HtmlUnit
  • JsUnit
  • HttpUnit
  • JWebUnit
  • FitNesse
  • Selenium

In practice, the rough edges on these tools make it very helpful to have an experienced agile programmer develop the first few tests and the test framework. Once you have an automated test suite in place, it is usually easier to add more tests yourself.


Go Ahead. Break the Build!

Wednesday, January 10th, 2007

There’s a philosophy in extreme programing circles that one should never break the build. As soon as the build is broken, everything stops until it can be fixed again.1 Some teams even hand out “dunce caps” to a programmer who breaks the build.

If by “build” you simply mean the compile-link-package cycle, then I tend to agree. Breaking the build is pretty serious. One of the advantages of extreme programming is that the increments of work are so small that you don’t get very far before discovering you’ve broken the build, so it’s relatively easy to fix. Integration is almost automatic rather than a painful, months long process. Avoiding coupling is also important to keep build times manageable.

However some systems define the build a little more broadly. They consider the build to include successful execution of all the unit tests. For example, Maven gives up if any unit test fails.* It will not create any subsequent targets such as a JAR file, if it can’t run unit tests. Ant doesn’t require this, but does allow this. All that’s necessary is declaring that the jar or zip target depends on the test target.

It’s not just open source either. Over at closed source vendor Atlassian, Charles Miller tells us, “all our tools are predicated on tests that start green and stay green. ” In fact, a failing test is so damaging to them, that he actually advocates writing tests that pass if the bug isn’t fixed and fails if it is. That’s a recipe for disaster if I ever heard one. Five years down the line some new programmer is going to finally fix the line of code that causes the bug, and then carefully reintroduce the bug to get back to the green bar.

This is where I part company from the most extreme of the extremists. If building includes passing all unit tests, then it is often acceptable and even desirable to break the build.

Test Your Code, Please!

Friday, December 22nd, 2006

No this isn’t another rant about agile programming or test-driven development or test first programming. There’s a depressing phenomenon in some open source projects (including Jaxen and PHP) where a programmer goes off in a corner, gets a cool idea, writes it up, contributes it, has it checked in, ships it to millions of users; one of whom has the distinct pleasure of being the first to ever actually use this code.

I am getting really tired of discovering code that is broken by design; not merely buggy but partially to completely non-functional and unable to be fixed. The worst case I ever saw was in Jaxen where I once spent two days trying to write unit tests for a package that had been contributed years ago (without any tests of course) and banging my head against the wall trying to figure out how to reach that code. Only after this time, did careful analysis of code paths reveal that the code could never be reached, no matter what. It never could have been reached.

Negative Experimental Programming

Thursday, September 21st, 2006

I’ve previously written about the benefits of experimental programming: fixing code until all tests pass without necessarily understanding at a theoretical level why they pass. This scares the bejeezus out of a lot of developers, but I’m convinced it’s going to be increasingly necessary as software grows more and more complex. The whole field of genetic algorithms is just an extreme example of this. Simulated annealing is another. However those techniques are primarily used by scientists who are accustomed to learning by experiment before they have a full and complete understanding of the underlying principles. Computer scientists are temperamentally closer to mathematicians, who almost never do experiments; and who certainly don’t trust the results of an experiment unless they can prove it theoretically.

However, perhaps it’s possible to introduce experimental programming in a slightly less controversial form. While many programmers may be unwilling to accept it as positive evidence of program correctness, it can offer undeniable evidence of program failure. This was recently brought to mind by a bug I was trying to fix in Jaxen.