Testing – The Cafes

Harold’s Corollary to Knuth’s Law

Elliotte Rusty Harold — Tue, 05 Aug 2008 15:12:02 +0000

Lately I’ve found myself arguing about the proper design of unit tests. On my side I’m claiming:

Unit tests should only touch the public API.
Code coverage should be as near 100% as possible.
It’s better to test the real thing than mock objects.

The goal is to make sure that the tests are as close to actual usage as possible. This means that problems are more likely to be detected and false positives are less likely. Furthermore, the discipline of testing through the public API when attempting to achieve 100% code coverage tends to reveal a lot about how the code really works. It routinely highlights dead code that can be eliminated. It reveal paths of optimization. It teaches me things about my own code I didn’t know. It shows patterns in the entire system that makes up my product.

By contrast some programmers advocate that tests should be method-limited. Each test should call the method as directly as possible, perhaps even making it public or non-private and violating encapsulation to enable this. Any external resources that are necessary to run the method such as databases or web servers should be mocked out. At the extreme, even other classes a test touches should be replaced by mock implementations.

This approach may sometimes let the tests be written faster; but not always. There’s a non-trivial cost to designing mock objects to replace the real thing; and sometimes that takes longer. This approach will still tend to find most bugs in the method being tested. However it stops there. It will not find code in the method that should be eliminated because it’s unreachable from the public API. Thus code tested with this approach is likely to be larger, more complex, and slower since it has to handle conditions that can’t happen through the public API. More importantly, such a test starts and stops with that one method. It reveals nothing about the interaction of the different parts of the system. It teaches nothing about how the code really operates in the more complex environment of the full system. It misses bugs that can emerge out of the mixture of multiple different methods and classes even when each method is behaving correctly in isolation according to its spec. That is, it often fails to find flaws in the specifications of the individual methods. Why then are so many programmers so adamant about breaking access protection and every other rule of good design as soon as they start testing?

Would you believe performance?

For instance consider this proposal from Michael Feathers:

A test is not a unit test if:

It talks to the database

It communicates across the network

It touches the file system

It can’t run at the same time as any of your other unit tests

You have to do special things to your environment (such as editing
config files) to run it.

Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.

More than 30 years ago Donald Knuth first published what would come to be called Knuth’s law: “premature optimization is the root of all evil in programming.” (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.) But some developers still haven’t gotten the message.

Are there some tests that are so slow they contribute to not running the test suite? Yes. We’ve all seen them, but there’s no way to tell which tests they are in advance. In my test suite for XOM, I have numerous tests that communicate across the network, touch the filesystem, and access third party libraries. However, almost all these tests run like a bat out of hell, and take no noticeable time. The slowest test in the suite? It’s one that operates completely in memory on byte array streams with no network access, does not touch the file system, uses no APIs beyond what’s in Java 1.2 and XOM itself, and there’s no database anywhere in sight. I do omit that test from my standard suite because it takes too long to run. I’ll run it explicitly once or twice before releasing a new version, but not every time I make a change.

I am now proposing Harold’s corollary to Knuth’s law: premature optimization is the root of all evil in testing. It is absolutely essential to make sure that your test suite runs fast enough to run after every change to the code and before every check in. I’m even willing to put a number on “fast enough”, and that number is 90 seconds. However, you simply cannot tell which tests are likely to be too slow to run routinely in advance of actual measurement. Castrating and contorting your tests to fit some imagined idea of what will and will not be slow limits their usefulness.

Tests should be designed for the ideal scenario: a computer that is infinitely fast with infinite memory and a network with zero latency and infinite bandwidth. Of course, that ideal computer doesn’t exist; and you’ll have to profile, optimize, and as a last resort cut back on your tests. However, I’ve never yet met a programmer who could reliably tell which tests (or other code) would and would not be fast enough in advance of actual measurements. Blanket rules that unit tests should not do X or talk to Y because it’s likely to be slow needlessly limits what we can learn from unit tests.

Testing

Elliotte Rusty Harold — Thu, 19 Jun 2008 03:17:55 +0000

Here’s part 11 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

This part’s a little funny because really it deserves an entire book on its own, and that book has yet to be written. I didn’t have space or time to write a complete second book about test driven development of web sites and web applications, but perhaps this small piece will inspire someone else to do it. If not, maybe I’ll get to it one of these days.

In theory, refactoring should not break anything that isn’t already broken. In practice, it isn’t always so reliable. To some extent, the catalog later in this book shows you what changes you can safely make. However, both people and tools do make mistakes; and it’s always possible that refactoring will introduce new bugs. Thus, the refactoring process really needs a good automated test suite. After every refactoring, you’d like to be able to press a button and see at a glance whether anything broke.

Although test-driven development has been a massive success among traditional programmers, it is not yet so common among web developers, especially those working on the front end. In fact, any automated testing of web sites is probably the exception rather than the rule, especially when it comes to HTML. It is time for that to change. It is time for web developers to start to write and run test suites and to use test-driven development.

The basic test-driven development approach is as follows:

Write a test for a feature.
Code the simplest thing that can possibly work.
Run all tests.
If tests passed, goto 1.
Else, goto 2.

For refactoring purposes, it is very important that this process be as automatic as possible. In particular:

The test suite should not require any complicated setup. Ideally, you should be able to run it with the click of a button. You don’t want developers to skip running tests because they’re too hard to run.
The tests should be fast enough that they can be run frequently; ideally, they should take 90 seconds or less to run. You don’t want developers to skip running tests because they take too long.
The result must be pass or fail, and it should be blindingly obvious which it is. If the result is fail, the failed tests should generate more output explaining what failed. However, passing tests should generate no output at all, except perhaps for a message such as “All tests passed”. In particular, you want to avoid the common problem in which one or two failing tests get lost in a sea of output from passing tests.

Writing tests for web applications is harder than writing tests for classic applications. Part of this is because the tools for web application testing aren’t as mature as the tools for traditional application testing. Part of this is because any test that involves looking at something and figuring out whether it looks right is hard for a computer. (It’s easy for a person, but the goal is to remove people from the loop.) Thus, you may not achieve the perfect coverage you can in a Java or .NET application. Nonetheless, some testing is better than none, and you can in fact test quite a lot.

One thing you will discover is that refactoring your code to web standards such as XHTML is going to make testing a lot easier. Going forward, it is much easier to write tests for well-formed and valid XHTML pages than for malformed ones. This is because it is much easier to write code that consumes well-formed pages than malformed ones. It is much easier to see what the browser sees, because all browsers see the same thing in well-formed pages and different things in malformed ones. Thus, one benefit of refactoring is improving testability and making test-driven development possible in the first place. Indeed, with a lot of web sites that don’t already have tests, you may need to refactor them enough to make testing possible before moving forward.

You can use many tools to test web pages, ranging from decent to horrible and free to very expensive. Some of these are designed for programmers, some for web developers, and some for business domain experts. They include:

HtmlUnit
JsUnit
HttpUnit
JWebUnit
FitNesse
Selenium

In practice, the rough edges on these tools make it very helpful to have an experienced agile programmer develop the first few tests and the test framework. Once you have an automated test suite in place, it is usually easier to add more tests yourself.

JUnit

JUnit (http://www.junit.org/) is the standard Java framework for unit testing, and the one on which a lot of the more specific frameworks such as HtmlUnit and HttpUnit are built. There’s no reason you can’t use it to test web applications, provided you can write Java code that pretends to be a browser. That’s actually not as hard as it sounds.

For example, one of the most basic tests you’ll want to run is one that tests whether each page on your site is well-formed. You can test this simply by parsing the page with an XML parser and seeing whether it throws any exceptions. Write one method to test each page on the site, and you have a very nice automated test suite for what we checked by hand in the previous section.

Listing 2.2 demonstrates a simple JUnit test that checks the well-formedness of my blog. All this code does is throw a URL at an XML parser and see whether it chokes. If it doesn’t, the test passes. This version requires Sun’s JDK 1.5 or later and JUnit 3.8 or later somewhere in the classpath. You may need to make some modifications to run this in other environments.

Listing 2.2: A JUnit Test for Web Site Well-Formedness

import java.io.IOException;
import junit.framework.TestCase;
import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;

public class WellformednessTests extends TestCase {

    private XMLReader reader;
    
    public void setUp() throws SAXException {
      reader = XMLReaderFactory.createXMLReader(
        "com.sun.org.apache.xerces.internal.parsers.SAXParser");
    }

    public void testBlogIndex() throws SAXException, IOException {
      reader.parse("http://www.elharo.com/blog/");
    }
}

You can run this test from inside an IDE such as Eclipse or NetBeans, or you can run it from the command line like so:

$ java -cp .:junit.jar junit.swingui.TestRunner WellformednessTests

If all tests pass, you’ll see a green bar as shown in Figure 2.3.

Figure 2.3: All tests pass.

To test additional pages for well-formedness, you simply add more methods, each of which looks exactly like testBlogIndex, just with a different URL. Of course, you can also write more complicated tests. You can test for validity by setting the http://xml.org/sax/features/validation feature on the parser and attaching an error handler that throws an exception if a validity error is detected.

You can use DOM, XOM, SAX, or some other API to load the page and inspect its contents. For instance, you could write a test that checks whether all links on a page are reachable. If you use TagSoup as the parser, you can even write these sorts of tests for non-well-formed HTML pages.

You can submit forms using the HttpURLConnection class or run JavaScript using the Rhino engine built into Java 6. This is all pretty low-level stuff, and it’s not trivial to do; but it’s absolutely possible to do it. You just have to roll up your sleeves and start coding.

If nondevelopers are making regular changes to your site, you can set up the test suite to run periodically with cron and to e-mail you if anything unexpectedly breaks. (It’s probably not reasonable to expect each author or designer to run the entire test suite before every check-in.) You can even run the suite continuously using a product such as Hudson or Cruise Control. However, that may fill your logs with a lot of uncountable test traffic, so you may wish to run this against the development server instead.

Many similar test frameworks are available for other languages and platforms: PyUnit for Python, CppUnit for C++, NUnit for .NET, and so forth. Collectively these go under the rubric xUnit. Whichever one you and your team are comfortable working with is fine for writing web test suites. The web server doesn’t care what language your tests are written in. As long as you have a one-button test harness and enough HTTP client support to write tests, you can do what needs to be done.

HtmlUnit

HtmlUnit (http://htmlunit.sourceforge.net/) is an open source JUnit extension designed to test HTML pages. It will be most familiar and comfortable to Java programmers already using JUnit for test-driven development. HtmlUnit provides two main advantages over pure JUnit.

The WebClient class makes it much easier to pretend to be a web browser.

The HTMLPage class has methods for inspecting common parts of an HTML document.

For example, HtmlUnit will run JavaScript that’s specified by an onLoad handler before it returns the page to the client, just like a browser would. Simply loading the page with an XML parser as Listing 2.2 did would not run the JavaScript.

Listing 2.3 demonstrates the use of HtmlUnit to check that all the links on a page are not broken. I could have written this using a raw parser and DOM, but it would have been somewhat more complex. In particular, methods such as getAnchors to find all the a elements in a page are very helpful.

Listing 2.3: An HtmlUnit Test for a Page’s Links

import java.io.IOException;
import java.net.*;
import java.util.*;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.*;

import junit.framework.TestCase;

public class LinkCheckTest extends TestCase {

    public void testBlogIndex() throws FailingHttpStatusCodeException, IOException {
        WebClient webClient = new WebClient();
        URL url = new URL("http://www.elharo.com/blog/");
        HtmlPage page = (HtmlPage) webClient.getPage(url);
        List links = page.getAnchors();
        Iterator iterator = links.iterator();
        while (iterator.hasNext()) {
          HtmlAnchor link = (HtmlAnchor) iterator.next();
          URL u = new URL(link.getHrefAttribute());
          // Check that we can download this page.
          // If we can't, getPage throws an exception and
          // the test fails.
          webClient.getPage(u);
      }
    }
}

This test is more than a unit test. It checks all the links on a page, whereas a real unit test would check only one. Furthermore, it makes connections to external servers. That’s very unusual for a unit test. Still, this is a good test to have, and it will let us know that we need to fix our pages if an external site breaks links by reorganizing its pages.

HttpUnit

HttpUnit (http://httpunit.sourceforge.net/) is another open source JUnit extension designed to test HTML pages. It is also best suited for Java programmers already using JUnit for test-driven development, and is in many ways quite similar to HtmlUnit. Some programmers prefer HttpUnit, and others prefer HtmlUnit. If there’s a difference between the two it’s that HttpUnit is somewhat lower-level. It tends to focus more on the raw HTTP connection whereas HtmlUnit more closely imitates a browser. HtmlUnit has somewhat better support for JavaScript, if that’s a concern. However, there’s certainly a lot of overlap between the two projects.

Listing 2.4 demonstrates an HttpUnit test that verifies that a page has exactly one H1 header, and that its text matches the web page’s title. That may not be a requirement for all pages, but it is a requirement for some. For instance, it would be a very apt requirement for a newspaper site.

Listing 2.4: An HttpUnit Test That Matches the Title to a Unique H1 Heading

import java.io.IOException;
import org.xml.sax.SAXException;
import com.meterware.httpunit.*;
import junit.framework.TestCase;

public class TitleChecker extends TestCase {

    public void testFormSubmission() throws IOException, SAXException {
        WebConversation wc = new WebConversation();
        WebResponse wr = wc.getResponse("http://www.elharo.com/blog/");
        HTMLElement[] h1 = wr.getElementsWithName("h1");
        assertEquals(1, h1.length);
        String title = wr.getTitle();
        assertEquals(title, h1[0].getText());
    }

}

I could have written this test in HtmlUnit too, and I could have written Listing 2.3 with HttpUnit. Which one you use is mostly a matter of personal preference. Of course, these are hardly the only such frameworks. There are several more, including ones not written in Java. Use whichever one you like, but by all means use something.

JWebUnit

JWebUnit is a higher-level API that sits on top of HtmlUnit and JUnit. Generally, JWebUnit tests involve more assertions and less straight Java code. These tests are somewhat easier to write without a large amount of Java expertise, and they may be more accessible to a typical web developer. Furthermore, tests can very easily extend over multiple pages as you click links, submit forms, and in general follow an entire path through a web application.

Listing 2.5 demonstrates a JWebUnit test for the search engine on my web site. It fills in the search form on the main page and submits it. Then it checks that one of the expected results is present.

Listing 2.5: A JWebUnit Test for Submitting a Form

import junit.framework.TestCase;
import net.sourceforge.jwebunit.junit.*;

public class LinkChecker extends TestCase {
    private WebTester tester;
    
    public LinkChecker(String name) {
        super(name);
        tester = new WebTester();
        tester.getTestContext().setBaseUrl("http://www.elharo.com/");
    }

    public void testFormSubmission() {
        // start at this page
        tester.beginAt("/blog/");
        // check that the form we want is on the page
        tester.assertFormPresent("searchform");
        /// check that the input element we expect is present
        tester.assertFormElementPresent("s");
        // type something into the input element
        tester.setTextField("s", "Linux");
        // send the form
        tester.submit();
        // we're now on a different page; check that the
        // text on that page is as expected.
        tester.assertTextPresent("Windows Vista");
    }
}

FitNesse

FitNesse (http://fitnesse.org/) is a Wiki designed to enable business users to write tests in table format. Business users like spreadsheets. The basic idea of FitNesse is that tests can be written as tables, much like a spreadsheet. Thus, FitNesse tests are not written in Java. Instead, they are written as a table in a Wiki.

You do need a programmer to install and configure FitNesse for your site. However, once it’s running and a few sample fixtures have been written, it is possible for savvy business users to write more tests. FitNesse works best in a pair environment, though, where one programmer and one business user can work together to define the business rules and write tests for them.

For web app acceptance testing, you install Joseph Bergin’s HtmlFixture (http://fitnesse.org/FitNesse.HtmlFixture). It too is based on HtmlUnit. It supplies instructions that are useful for testing web applications such as typing into forms, submitting forms, checking the text on a page, and so forth.

Listing 2.6 demonstrates a simple FitNesse test that checks the http-equiv meta tag in the head to make sure it’s properly specifying UTF-8. The first three lines set the classpath. Then, after a blank line, the next line identifies the type of fixture as an HtmlFixture. (There are several other kinds, but HtmlFixture is the common one for testing web applications.)

The external page at http://www.elharo.com/blog/ is then loaded. In this page, we focus on the element named meta that has an id attribute with the value charset. This will be the subject for our tests.

The test then looks at two attributes of this element. First it inspects the content attribute and asserts that its value is text/html; charset=utf-8. Next it checks the http-equiv attribute of the same element and asserts that its value is content-type.

Listing 2.6: A FitNesse Test for

!path fitnesse.jar
!path htmlunit-1.5/lib/*.jar
!path htmlfixture20050422.jar

!|com.jbergin.HtmlFixture|
|http://www.elharo.com/blog/|
|Element Focus|charset |meta|
|Attribute |content |text/html; charset=utf-8|
|Attribute |http-equiv|content-type|

This test would be embedded in a Wiki page. You can run it from a web browser just by clicking the Test button, as shown in Figure 2.4. If all of the assertions pass, and if nothing else goes wrong, the test will appear green after it is run. Otherwise, it will appear pink. You can use other Wiki markup elsewhere in the page to describe the test.

Figure 2.4: A FitNesse page

Selenium

Selenium is an open source browser-based test tool designed more for functional and acceptance testing than for unit testing. Unlike with HttpUnit and HtmlUnit, Selenium tests run directly inside the web browser. The page being tested is embedded in an iframe and the Selenium test code is written in JavaScript. It is mostly browser and platform independent, though the IDE for writing tests, shown in Figure 2.5, is limited to Firefox.

Figure 2.5: The Selenium IDE

Although you can write tests manually in Selenium using remote control, it is really designed more as a traditional GUI record and playback tool. This makes it more suitable for testing an application that has already been written, and less suitable for doing test-driven development.

Selenium is likely to be more comfortable to front-end developers who are accustomed to working with JavaScript and HTML. It is also likely to be more palatable to professional testers because it’s similar to some of the client GUI testing tools they’re already familiar with.

Listing 2.7 demonstrates a Selenium test that verifies that www.elharo.com shows up in the first page of results from a Google search for “Elliotte”. This script was recorded in the Selenium IDE and then edited a little by hand. You can load it into and then run it from a web browser. Unlike the other examples given here, this is not Java code, and it does not require major programming skills to maintain. Selenium is more of a macro language than a programming language.

Listing 2.7: Test That elharo.com Is in the Top Search Results for Elliotte


  
    
    elharo.com is a top search results for Elliotte
  
  
    
      
        New Test
        

        
        open
        /
        
        
        
        type
        q
        elliotte
        
        
        clickAndWait
        btnG
        
        
        
        verifyTextPresent
        www.elharo.com/

New Test
open	/
type	q	elliotte
clickAndWait	btnG
verifyTextPresent	www.elharo.com/

Obviously, Listing 2.6 is a real HTML document. You can open this with the Selenium IDE in Firefox and then run the tests. Because the tests run directly inside the web browser, Selenium helps you find bugs that occur in only one browser or another. Given the wide variation in browsers that CSS, HTML, and JavaScript support this capability is very useful. HtmlUnit, HttpUnit, JWebUnit, and the like use their own JavaScript engines which do not always have the same behavior as the browsers’ engines. Selenium uses the browsers themselves, not imitations of them.

The IDE can also export the tests as C#, Java, Perl, Python, or Ruby code so that you can integrate Selenium tests into other environments. This is especially important for test automation. Listing 2.8 shows the same test as in Listing 2.7, but this time in Ruby. However, this will not necessarily catch all the cross-browser bugs you’ll find by running the tests directly in the browser.

Listing 2.8: Automated Test That elharo.com Is in the Top Search Results for Elliotte

require "selenium"
require "test/unit"

class GoogleSearch < Test::Unit::TestCase
def setup
@verification_errors = []
if $selenium
@selenium = $selenium
else
@selenium = Selenium::SeleneseInterpreter.new("localhost",
4444, *firefox", "http://localhost:4444", 10000);
@selenium.start
end
@selenium.set_context("test_google_search", "info")
end
def teardown
@selenium.stop unless $selenium
assert_equal [], @verification_errors
end
def test_google_search
@selenium.open "/"
@selenium.type "q", "elliotte"
@selenium.click "btnG"
@selenium.wait_for_page_to_load "30000"
begin
assert @selenium.is_text_present("www.elharo.com/")
rescue Test::Unit::AssertionFailedError
@verification_errors << $!
end
end
end

Getting Started with Tests

Because you’re refactoring, you already have a web site or application; and if it’s like most I’ve seen, it has limited, if any, front-end tests. Don’t let that discourage you. Pick the tool you like and start to write a few tests for some basic functionality. Any tests at all are better than none. At the early stages, testing is linear. Every test you write makes a noticeable improvement in your code coverage and quality. Don’t get bogged down thinking you have to test everything. That’s great if you can do it; but if you can’t, you can still do something.

Before refactoring a particular page, subdirectory, or path through a site, take an hour and write at least two or three tests for that section. If nothing else, these are smoke tests that will let you know if you totally muck up everything. You can expand on these later when you have time.

If you find a bug, by all means write a test for the bug before fixing it. That will help you know when you’ve fixed the bug, and it will prevent the bug from accidentally reoccurring in the future after other changes. Because front-end tests aren’t very unitary, it’s likely that this test will indirectly test other things besides the specific bit of buggy code.

Finally, for new features and new developments beyond refactoring, by all means write your tests first. This will guarantee that the new parts of the site are tested, and tests will often leak over into the older pages and scripts as well.

Automatic testing is critical to developing a robust, scalable application. Developing a test suite can seem daunting at first, but it’s worth doing. The first test is the hardest. Once you’ve set up your test framework and written your first test, subsequent tests will flow much more easily. Just as you can improve a site linearly through small, automatic refactorings that add up over time, so too can you improve your test suite by adding just a few tests a week. Sooner than you know, you’ll have a solid test suite that helps to ensure reliability by telling you when things are broken and showing you what to fix.

Go Ahead. Break the Build!

Elliotte Rusty Harold — Wed, 10 Jan 2007 13:06:38 +0000

There’s a philosophy in extreme programing circles that one should never break the build. As soon as the build is broken, everything stops until it can be fixed again.¹ Some teams even hand out “dunce caps” to a programmer who breaks the build.

If by “build” you simply mean the compile-link-package cycle, then I tend to agree. Breaking the build is pretty serious. One of the advantages of extreme programming is that the increments of work are so small that you don’t get very far before discovering you’ve broken the build, so it’s relatively easy to fix. Integration is almost automatic rather than a painful, months long process. Avoiding coupling is also important to keep build times manageable.

However some systems define the build a little more broadly. They consider the build to include successful execution of all the unit tests. For example, Maven gives up if any unit test fails.^* It will not create any subsequent targets such as a JAR file, if it can’t run unit tests. Ant doesn’t require this, but does allow this. All that’s necessary is declaring that the jar or zip target depends on the test target.

It’s not just open source either. Over at closed source vendor Atlassian, Charles Miller tells us, “all our tools are predicated on tests that start green and stay green. ” In fact, a failing test is so damaging to them, that he actually advocates writing tests that pass if the bug isn’t fixed and fails if it is. That’s a recipe for disaster if I ever heard one. Five years down the line some new programmer is going to finally fix the line of code that causes the bug, and then carefully reintroduce the bug to get back to the green bar.

This is where I part company from the most extreme of the extremists. If building includes passing all unit tests, then it is often acceptable and even desirable to break the build.

There are several reasons you may choose to introduce a failing test, thus breaking the build; but not fix it immediately. First of all, the person who writes the failing test may not be the best qualified to fix it. For instance, the person who finds the bug and writes the test could be a tester rather than a programmer; or they could be a programmer who’s working on a different piece of the code. Sure, they can fix the bug if they happen to see how to; but if they don’t, it’s still valuable to commit the unit test that proves the bug’s existence.

Even if the person who finds the bug is responsible for fixing it, the fix may not be apparent. Some bugs require a lot of work. Some bugs require only a little work, but may still require a night to sleep on it before the fix becomes obvious. Sometimes it’s a 30 minute fix, but the programmer only has 15 minutes before they have to pick up their daughter from football practice. For whatever reason, not every bug can be fixed immediately. It’s still better to commit the failing unit test so it doesn’t get forgotten in the future, even if that “breaks” the build.

Sometimes failing tests are beyond programmer control. For instance, XOM has one unit test that passes on Windows and Linux, but fails on the Mac. This particular test runs across a bug in Apple’s VM involving files whose name contain Unicode characters from outside the Basic Multilingual Plane. There’s not a lot I can do to fix that; but I shouldn’t pretend it’s not a problem either.

All too often projects are in denial about their bugs. Rather than accept clear evidence of a bug, they do anything they can to deny it. Refusing to allow the build to break, and then defining a test failure as build breakage, is just one example of this pathology. Recognizing, identifying, and reproducing a bug with a well-defined unit test is a valuable contribution. It should be accepted gladly, not met with hostility and an insistence that the reporter immediately provide a patch.

¹ Joel Spolsky: “If a daily build is broken, you run the risk of stopping the whole team. Stop everything and keep rebuilding until it’s fixed. Some days, you may have multiple daily builds.”

² You can tell Maven to exclude specific failing tests from your project.xml by listing the failing tests in an excludes element like so:

    
      
        **/*Test.java
      
      
      
        
        org/jaxen/test/JDOMXPathTest.java

You can also tell Maven not to run the unit tests using the command-line options maven.test.skip=true to skip the unit tests or maven.test.failure.ignore=true to run the tests but not stop if a test fails. However these are just hacks to get around what Maven really wants to do: run all the unit tests and fail the build if any fail.

Test Your Code, Please!

Elliotte Rusty Harold — Fri, 22 Dec 2006 09:31:12 +0000

No this isn’t another rant about agile programming or test-driven development or test first programming. There’s a depressing phenomenon in some open source projects (including Jaxen and PHP) where a programmer goes off in a corner, gets a cool idea, writes it up, contributes it, has it checked in, ships it to millions of users; one of whom has the distinct pleasure of being the first to ever actually use this code.

I am getting really tired of discovering code that is broken by design; not merely buggy but partially to completely non-functional and unable to be fixed. The worst case I ever saw was in Jaxen where I once spent two days trying to write unit tests for a package that had been contributed years ago (without any tests of course) and banging my head against the wall trying to figure out how to reach that code. Only after this time, did careful analysis of code paths reveal that the code could never be reached, no matter what. It never could have been reached.

More recently I’m working with some libraries in PHP copied from libxml which itself copied from .NET. Since the developers were only copying from other APIs, I guess they didn’t think it might make sense to stop and try to write a little sample code that used their libraries and see if it worked. In any case, the libraries in both libxml and PHP, though more functional than Jaxen’s org.jaxen.dom.html package, are full of public but unreachable code. They contain constants for node types that will never be supplied by the parser. They are poorly documented because the designers just copied rather than creating from scratch. Half the questions I have end up with me referred back to the .NET API docs and hoping that the copy was a faithful one. Even the original programmers no longer remember exactly how it works.

Of course like Mariposan clones on Star Trek TNG, it gets worse the further away from the source material you get. The .NET APIs are a little broken. The libxml APIs that copied the .NET APIs are somewhat broken. The PHP APIs on top of libxml are a lot broken. They miss basic things that libxml can do like detecting the end of the document, or recognizing errors. (The specific mistake was converting a ternary int return type into a binary boolean type in the translation.)

However, it’s not the bad translation and adaptation that gets me. It’s that these flaws are really, really obvious. Anybody who sits down to write almost any code that uses these APIs is going to hit these problems almost immediately. Unit tests might not catch the design flaws. After all, unit tests only verify that the code is doing what the programmer says it does. They don’t verify that the programmer has chosen to make the code do the right thing.

We cannot rely on unit tests alone. We also need to actually use the API before committing to it and shipping it. Try writing some non-trivial programs, and see how the API fares. Only by actually using the API in anger can you tell where it works and where it fails. You can find out what’s right, what’s wrong, and what’s missing. If you pay attention, you’ll also find out what it includes that you don’t need and that you can therefore remove. (Smaller is better.)

When I designed XOM, I took my first pass at the API, and used it to reimplement every example from Processing XML with Java in XOM. I also implemented some additional XML specifications such as Canonical XML and XInclude on top of the core libraries but in separate packages to make sure the core libraries offered enough public access to build other pieces of infrastrucutre on top of them.

The results were surprising. Several methods I thought were essential didn’t actually get called anywhere. and I could take them out. I also found quite a few things I’d forgotten to do, that I obviously needed. Later I released the library to other users, who found more things I’d left out, as well as other flaws. Only after I was confident the library had been put to some real use did I declare it finished and freeze the API.

Of course, I knew that I would have to add more features in the future; but slow and steady wins the race. I don’t add anything to XOM unless someone actually asks for it and provides a clear use case. Even if the feature is obviously a good thing in itself, I can’t be sure it’s designed right unless I can see how the feature is used.

As I write this, I notice someone is gearing up to make the same damn mistake, but this time in Ruby. Laurent Sansonetti writes:

For the curious, here is the patch to bring the xmlTextReader API to the libxml-ruby project, that I used in my previous hack. The patch has been generated from CVS HEAD.

Everything valuable should be wrapped, but note that the test cases are not complete and that there is no documentation (RDoc comments) yet. As the libxml-ruby guys were interested in it I just sent it to them as well.

Laurent, if you don’t have test cases or documentation, it isn’t done! Please don’t submit it.

Contributors, please don’t send in patches that add new features and new API unless you’ve actually used them first. Project maintainers: don’t accept code contributions unless there’s a clear use case and evidence of actual use. Otherwise you’re not really maintaining, just being a pack rat. When in doubt, leave it out. It is better to leave out a feature in this release than commit to supporting a broken API for the indefinite future.

Negative Experimental Programming

Elliotte Rusty Harold — Thu, 21 Sep 2006 11:00:53 +0000

I’ve previously written about the benefits of experimental programming: fixing code until all tests pass without necessarily understanding at a theoretical level why they pass. This scares the bejeezus out of a lot of developers, but I’m convinced it’s going to be increasingly necessary as software grows more and more complex. The whole field of genetic algorithms is just an extreme example of this. Simulated annealing is another. However those techniques are primarily used by scientists who are accustomed to learning by experiment before they have a full and complete understanding of the underlying principles. Computer scientists are temperamentally closer to mathematicians, who almost never do experiments; and who certainly don’t trust the results of an experiment unless they can prove it theoretically.

However, perhaps it’s possible to introduce experimental programming in a slightly less controversial form. While many programmers may be unwilling to accept it as positive evidence of program correctness, it can offer undeniable evidence of program failure. This was recently brought to mind by a bug I was trying to fix in Jaxen.

Dominic Krupp had reported the bug on the Jaxen user mailing list. I was skeptical at first, but he had an executable test case so I couldn’t really deny it. As Joel Spolsky writes,

Programmers have very well-honed senses of justice. Code either works, or it doesn’t. There’s no sense in arguing whether a bug exists, since you can test the code and find out. The world of programming is very just and very strictly ordered and a heck of a lot of people go into programming in the first place because they prefer to spend their time in a just, orderly place, a strict meritocracy where you can win any debate simply by being right.

I rewrote Krupp’s test case as a JUnit test and added it to our test suite. Though the bug was now obvious the fix was not, so the first step was to isolate the problem. I figured the problem would either be in the Jaxen core or in the JDOM specific code. To find out, I rewrote the test using DOM rather than JDOM. That test passed, which convinced me the bug was very likely in the JDOM-part of the code, and that’s where I should begin looking. Note that I still didn’t understand why the bug existed; but experiment gave me useful information anyway.

I then began single stepping through the test case with the Eclipse debugger. Jaxen’s code is pretty involved, and even a simple XPath expression can involve many loops and quite deep method hierarchies. Furthermore, many methods are called multiple times from multiple places so break points aren’t quite as useful. I either missed the bug or neglected to enter the method where the bug resided. I still didn’t understand how this bug could arise, or have a good idea about where to look to fix it, so I put the problem aside for a while.

Next, the original reporter proposed a fix. Now I’m in a quandary. How do I judge his patch when I don’t understand the bug? Furthermore, looking at the patch, it’s not obvious to me how it would fix the problem. But this is where experimental programming comes to the rescue again. I don’t need to understand it! All I have to do is plug it in and see if the tests pass. The first results were positive. It did indeed fix the bug he reported. The next results were not. It broke 21 other unit tests that were passing without it. Consequently I could reject the patch, still without actually understanding it. I don’t need to explain how it’s broken. I just need to prove that it is by experiment and test.

You can learn something, even from an incorrect patch. I wasn’t even looking at the method Krupp patched to fix the bug. His patch may have failed, but it gives me new insight into the problem. Perhaps a slight modification of his patch might still fix the problem without breaking 21 other things. I’ll have to experiment and see.

Test Everything, No Matter How Simple

Elliotte Rusty Harold — Tue, 22 Aug 2006 11:18:44 +0000

There are many reasons to write your tests first. An oft unmentioned benefit is that it makes the programmer stop and think about what they’re doing. More often than you’d expect, the obvious fix is not the right answer; and writing the test first can reveal that.

For example, recently Wolfgang Hoschek pointed out that in XOM the Attribute class’s setType() method was failing to check for null. Once he pointed it out, it was an obvious problem so I opened up Eclipse and got ready to fix it:

      private void _setType(Type type) {
        this.type = type;
    }

The fix was trivial:

      private void _setType(Type type) {
        if (type == null) {
          throw new NullPointerException("Null attribute type");
        }
        this.type = type;
    }

Fortunately I stopped myself. Even with a known and obvious bug, one shouldn’t fix it without first writing a test case, so I did:

    public void testNullType() {
        try {
            a1.setType(null);
            fail("Didn't throw NullPointerException");
        }
        catch (NullPointerException success) {
            assertNotNull(success.getMessage());  
        }
    }

And it’s a good thing I did to because it was only when writing the test case that I realized I was patching the wrong method. The check belonged in setType(), not _setType(). The former is the public method. The latter is a package-access method for internal use that saves time by doing less checking in situations where the data is already known to be good. Had I written the code without a test case, I wouldn’t have noticed that I was slowing down XOM by putting the check in the wrong place.

Even if I hadn’t been about to make the wrong fix, it still would have made sense to write the test. The test is not just about writing the code now. It’s about documenting the proper behavior of the code for future programmers. Once the test is in the test suite, it’s clearly documented that throwing a NullPointerException when the argument is null is the expected and contracted behavior. A future developer, whether me or someone else, can still change it if they want to, of course. However, they’d have to make a deliberate and carefully considered decision to do so. it won’t be changed by accident or oversight. It is unlikely to become a regression.

No code should be untested. Even the simplest code (like this almost trivial setter method) can have bugs, so even the simplest code needs to be tested. No exceptions! No code is too simple to fail.

How do you specify an exponentiation function with a test?

Elliotte Rusty Harold — Sat, 08 Jul 2006 11:31:27 +0000

While it may be a slightly too extreme position to say that tests are the only spec, I think it is absolutely reasonable to consider tests to be a major part of the spec. Indeed a specification without normative test cases is far less likely to be implemented correctly and interoperably than one with a solid normative test suite. The more exhaustive the test suite is, the easier it is to write a conforming correct implementation.

Cedric Beust presents the question, “how do you specify an exponentiation function with a test?” as a counterexample to tests as specs. Actually I don’t think it’s all that hard. Here’s one example:

import junit.framework.TestCase;

public class ExponentiationTest extends TestCase {

    public void testZero() {
        double x = Exponent.calculate(10, 0);
        assertEquals(1, x, 0.0);
    }
    
    public void testOne() {
        double x = Exponent.calculate(10, 1);
        assertEquals(10.0, x, 0.0);
    }
    
    public void testNegativeBase() {
        double x = Exponent.calculate(-1, 2);
        assertEquals(1, x, 0.0);
    }
    
    public void testBigExponentWithOneBase() {
        double x = Exponent.calculate(1, 2000);
        assertEquals(1, x, 0.0);
    }
    
    public void testSquare() {
        double x = Exponent.calculate(10, 2);
        assertEquals(100, x, 0.0);
    }
    
   public void testSquareRoot() {
        double x = Exponent.calculate(100.0, 0.5);
        assertEquals(10, x, 0.0);
    }
    
   public void testFractionalBase() {
        double x = Exponent.calculate(0.5, 2);
        assertEquals(0.25, x, 0.0);
    }
    
   public void testNegativePower() {
        double x = Exponent.calculate(100.0, -1);
        assertEquals(0.01, x, 0.0);
    }
    
   public void testZeroZero() {
        double x = Exponent.calculate(0, 0);
        assertTrue(Double.isNaN(x));
    }
    
}

Clearly you could expand on this short example with more test cases, and if I were writing this for real code I would. In particular I’d have to think hard about what happens when the results overflow the bounds of a double. But I think this makes the point.

Probably if I were really writing a spec, I wouldn’t define these test cases as compilable code like this. I’d likely just list the expected input, outputs, and tolerances; and then write a tool to convert them to actual tests. However that’s an implementation detail.

P.S. These test cases found what is at least arguably a bug (or perhaps a design defect) in Java’s Math.pow() method.

Specification by Colonization

Elliotte Rusty Harold — Mon, 29 May 2006 12:13:41 +0000

The final chapter of the recently published Java I/O, 2nd edition focuses on the Java Bluetooth API. Like about half of what’s going on in Java today the Java Bluetooth API was defined and developed in the Java Community Process (JCP). I spend a lot of energy criticizing the W3C process, but compared to the JCP, it’s a model of sanity.

At first glance JCP specs look OK; but once you really start digging into one at the level you need to write a book about one or implement it, you rapidly discover huge areas of unspecified behavior. Then when I write about it, I have to test the implementations to see what they actually do. I rarely test more than one or two, and then I write down its behavior as what happens. Then other people come along and read my book to figure out what they’re supposed to. The behavior that gets fixed is chosen almost by accident.

I blame this on the JCP‘s culture of implementation as specification. Many specs are nothing but the JavaDoc compiled from the reference implementation and pointlessly encoded in PDF and/or a zip file. The specs are thus only marginally better than documentation for a typical software product (i.e. slightly better than wretched).

By contrast W3C specs are normally written independently of implementations. Then the working group checks to see if the specs can actually be implemented in a compatible fashion. Since they require two implementations of each feature, this tends to identify any grey areas in the spec, especially when the people doing the implementing are not the people who wrote the spec.

Final == Good

Elliotte Rusty Harold — Wed, 24 May 2006 12:28:53 +0000

Here’s an interesting article that is totally, completely 180 degrees wrong. I’ve said this before, and I’ve said it again, but I’ll say it one more time: final should be the default. Java’s mistake was not that it allowed classes to be final. It was making final a keyword you had to explicitly request rather than making finality the default and adding a subclassable keyword to change the default for those few classes that genuinely need to be nonfinal. The lack of finality has created a huge, brittle, dangerously breakable infrastructure in the world of Java class libraries.

The latest shot from the final-haters is a false claim that finality somehow prevents unit testing. To paraphrase Henry S. Thompson, I hate to move a direct negative, but no! There is nothing in finality that prevents unit testing, and I don’t know why people claim it does. I’ve had zero trouble testing final classes in my own work. I suppose it makes writing mock classes a little trickier, but I’m not sure that’s a bad thing. I much prefer to test the real classes and the real interactions that show what really happens rather than what the mock designer thinks will happen. Bugs aren’t always where you expect them to be. And even if finality did somehow interfere with unit testing, breaking the API to support the tests is a clear case of the tail wagging the dog. The tests exist to serve the code, not the other way around.

By way of contrast, although I’m careful to unit test subclassing for my nonfinal classes, I rarely encounter any other projects and libraries where that’s done. I’d venture to say that classes that allow their methods to be overridden rarely test that scenario in any way at all. Ditto for testing protected methods.

I will back up a little bit. It’s really only overriding methods that bothers me. I don’t have any particular objection to adding methods to a subclass. Probably the default should be to make all methods final unless they’re explicitly tagged as overridable. If that were done, you’d rarely need final on classes. However methods should be allowed to be overridden only after much careful thought, planning, and testing.

One final point: final is the safe, conservative choice. Should you mark a class or method final, and later discover a need to subclass/override it, you can remove the finality without breaking anyone’s code. You cannot go the other way. Once you’ve published a class that’s non-final you have to consider the possibility that someone, somewhere is subclassing it. Marking it final now risks breaking people’s running code and working systems.

One of the principles of exteme programming is to make the simplest change that could possibly work. Final is simpler than non-final. Final commits you to less. If you need a class to be non-final, fine. But please don’t make classes non-final by default.

The Nastiest Bug

Elliotte Rusty Harold — Thu, 02 Mar 2006 20:22:51 +0000

There’s one trap I fall into repeatedly while doing software development of any kind: testing, debugging, coding, documenting, anything. And it happens in every language I’ve ever worked in: Java, C++, Perl, CSS, HTML, XML, etc. The only difference is how much time I waste tracking down the bug.

I have this on my mind now because I just lost at least half an hour to this while working on the CSS stylesheet for this very web site. I have had many students show up during my office hours for help with debugging this problem. I have had at least one company pay me lots of money to fix this problem for them (though they didn’t know this was their bug or they wouldn’t have needed to call me in the first place). I can virtually guarantee you’ve made this mistake too. What is the mistake?

Editing the Wrong File

The problem usually occurs when there are two files named Navigator.java, or wp-admin.css or prices.pl, or whatever. More often than not, they are different versions of the same file. Usually they’re in different directories. Maybe you’ve opened one from a backup directory, or you’ve accidentally saved the file onto your desktop instead of your src directory. Sometimes there are entirely separate copies of the source tree. (That’s what happened to me today. I was actually editing the CSS file for The Cafes, but loading the one from Mokka mit Schlag in my browser.)

This mistake is so obvious we rarely talk about it. Once you realize that’s what you’re doing, the fix is totally obvious. You feel like a bonehead, fix the problem, and move on; at least until the next time it happens. Certainly no one wants to stand up and admit they’ve done this. It is an incredibly stupid mistake, one only a bonehead could make more than once. At least it feels like that to me every time I make it, which is at least once a month.

Until you realize what you’re doing, this is the most frustrating bug imaginable. It usually happens to me during debugging or iterative development. I intend to make a small change and verify that the feature is added or the bug is fixed. I make the change and run the test. (Whether the test is manual or automated doesn’t matter here.) Nothing has changed.

“That’s funny,” I think. I look at the code again and twiddle it a little bit. Still nothing.

I try something else. Nothing.

I back out the edit and start over. Nothing.

I start digging deeper, and trying to figure out how the really obvious piece of code could possibly not be doing what it is obviously doing. Maybe I set a breakpoint; then run the code in the debugger. If I’m lucky the breakpoint isn’t reached. You think that would tell me what the problem is, but it rarely does. My usual reaction is, “Hmm, I must have accidentally hit Run instead of Debug. let’s try that again.” So I run the code in the debugger a second time. Still nothing.

At this point, I’m getting very frustrated. Sometimes I’ll drop out of my IDE to the command line and try it there. Sometimes that will actually work if the command line is picking up different files than the IDE. Sometimes it won’t.

Sooner or later (but later more than sooner) I finally realize what the problem is. I slap myself on the head and move on. The problem is fixed for the moment, but this keeps happening!

Is there any way to avoid this? I am convinced that the programming community is losing person-years of productivity to this mistake. Pair programming doesn’t help. I’ve absolutely been half of pairs where both pairs of eyes were totally focused on looking for bugs in the wrong file. IDEs don’t help. The fact that they often need their own copy of the source tree makes the problem worse rather than better. Ditto for source code control systems. The problem is at its worst in server side environments because the need to deploy the compiled archive introduces another step where the wrong file can replace the right one. Worse yet, the symptoms of this bug in a server side environment look very much like a failure to reload a compiled class or restart the server, even if that’s not what’s happening at all.

This is such an incredibly dumb bug there must be a way to prevent it, but for the life of me I can’t think of one. Ideas?