Here’s part 11 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

This part’s a little funny because really it deserves an entire book on its own, and that book has yet to be written. I didn’t have space or time to write a complete second book about test driven development of web sites and web applications, but perhaps this small piece will inspire someone else to do it. If not, maybe I’ll get to it one of these days. 🙂

In theory, refactoring should not break anything that isn’t already broken. In practice, it isn’t always so reliable. To some extent, the catalog later in this book shows you what changes you can safely make. However, both people and tools do make mistakes; and it’s always possible that refactoring will introduce new bugs. Thus, the refactoring process really needs a good automated test suite. After every refactoring, you’d like to be able to press a button and see at a glance whether anything broke.

Although test-driven development has been a massive success among traditional programmers, it is not yet so common among web developers, especially those working on the front end. In fact, any automated testing of web sites is probably the exception rather than the rule, especially when it comes to HTML. It is time for that to change. It is time for web developers to start to write and run test suites and to use test-driven development.

The basic test-driven development approach is as follows:

  1. Write a test for a feature.
  2. Code the simplest thing that can possibly work.
  3. Run all tests.
  4. If tests passed, goto 1.
  5. Else, goto 2.

For refactoring purposes, it is very important that this process be as automatic as possible. In particular:

  • The test suite should not require any complicated setup. Ideally, you should be able to run it with the click of a button. You don’t want developers to skip running tests because they’re too hard to run.
  • The tests should be fast enough that they can be run frequently; ideally, they should take 90 seconds or less to run. You don’t want developers to skip running tests because they take too long.
  • The result must be pass or fail, and it should be blindingly obvious which it is. If the result is fail, the failed tests should generate more output explaining what failed. However, passing tests should generate no output at all, except perhaps for a message such as “All tests passed”. In particular, you want to avoid the common problem in which one or two failing tests get lost in a sea of output from passing tests.

Writing tests for web applications is harder than writing tests for classic applications. Part of this is because the tools for web application testing aren’t as mature as the tools for traditional application testing. Part of this is because any test that involves looking at something and figuring out whether it looks right is hard for a computer. (It’s easy for a person, but the goal is to remove people from the loop.) Thus, you may not achieve the perfect coverage you can in a Java or .NET application. Nonetheless, some testing is better than none, and you can in fact test quite a lot.

One thing you will discover is that refactoring your code to web standards such as XHTML is going to make testing a lot easier. Going forward, it is much easier to write tests for well-formed and valid XHTML pages than for malformed ones. This is because it is much easier to write code that consumes well-formed pages than malformed ones. It is much easier to see what the browser sees, because all browsers see the same thing in well-formed pages and different things in malformed ones. Thus, one benefit of refactoring is improving testability and making test-driven development possible in the first place. Indeed, with a lot of web sites that don’t already have tests, you may need to refactor them enough to make testing possible before moving forward.

You can use many tools to test web pages, ranging from decent to horrible and free to very expensive. Some of these are designed for programmers, some for web developers, and some for business domain experts. They include:

  • HtmlUnit
  • JsUnit
  • HttpUnit
  • JWebUnit
  • FitNesse
  • Selenium

In practice, the rough edges on these tools make it very helpful to have an experienced agile programmer develop the first few tests and the test framework. Once you have an automated test suite in place, it is usually easier to add more tests yourself.


JUnit (http://www.junit.org/) is the standard Java framework for unit testing, and the one on which a lot of the more specific frameworks such as HtmlUnit and HttpUnit are built. There’s no reason you can’t use it to test web applications, provided you can write Java code that pretends to be a browser. That’s actually not as hard as it sounds.

For example, one of the most basic tests you’ll want to run is one that tests whether each page on your site is well-formed. You can test this simply by parsing the page with an XML parser and seeing whether it throws any exceptions. Write one method to test each page on the site, and you have a very nice automated test suite for what we checked by hand in the previous section.

Listing 2.2 demonstrates a simple JUnit test that checks the well-formedness of my blog. All this code does is throw a URL at an XML parser and see whether it chokes. If it doesn’t, the test passes. This version requires Sun’s JDK 1.5 or later and JUnit 3.8 or later somewhere in the classpath. You may need to make some modifications to run this in other environments.

Listing 2.2: A JUnit Test for Web Site Well-Formedness

import java.io.IOException;
import junit.framework.TestCase;
import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;

public class WellformednessTests extends TestCase {

    private XMLReader reader;
    public void setUp() throws SAXException {
      reader = XMLReaderFactory.createXMLReader(

    public void testBlogIndex() throws SAXException, IOException {

You can run this test from inside an IDE such as Eclipse or NetBeans, or you can run it from the command line like so:

$ java -cp .:junit.jar junit.swingui.TestRunner WellformednessTests

If all tests pass, you’ll see a green bar as shown in Figure 2.3.

Green Bar
Figure 2.3: All tests pass.

To test additional pages for well-formedness, you simply add more methods, each of which looks exactly like testBlogIndex, just with a different URL. Of course, you can also write more complicated tests. You can test for validity by setting the http://xml.org/sax/features/validation feature on the parser and attaching an error handler that throws an exception if a validity error is detected.

You can use DOM, XOM, SAX, or some other API to load the page and inspect its contents. For instance, you could write a test that checks whether all links on a page are reachable. If you use TagSoup as the parser, you can even write these sorts of tests for non-well-formed HTML pages.

You can submit forms using the HttpURLConnection class or run JavaScript using the Rhino engine built into Java 6. This is all pretty low-level stuff, and it’s not trivial to do; but it’s absolutely possible to do it. You just have to roll up your sleeves and start coding.

If nondevelopers are making regular changes to your site, you can set up the test suite to run periodically with cron and to e-mail you if anything unexpectedly breaks. (It’s probably not reasonable to expect each author or designer to run the entire test suite before every check-in.) You can even run the suite continuously using a product such as Hudson or Cruise Control. However, that may fill your logs with a lot of uncountable test traffic, so you may wish to run this against the development server instead.

Many similar test frameworks are available for other languages and platforms: PyUnit for Python, CppUnit for C++, NUnit for .NET, and so forth. Collectively these go under the rubric xUnit. Whichever one you and your team are comfortable working with is fine for writing web test suites. The web server doesn’t care what language your tests are written in. As long as you have a one-button test harness and enough HTTP client support to write tests, you can do what needs to be done.


HtmlUnit (http://htmlunit.sourceforge.net/) is an open source JUnit extension designed to test HTML pages. It will be most familiar and comfortable to Java programmers already using JUnit for test-driven development. HtmlUnit provides two main advantages over pure JUnit.

The WebClient class makes it much easier to pretend to be a web browser.

The HTMLPage class has methods for inspecting common parts of an HTML document.

For example, HtmlUnit will run JavaScript that’s specified by an onLoad handler before it returns the page to the client, just like a browser would. Simply loading the page with an XML parser as Listing 2.2 did would not run the JavaScript.

Listing 2.3 demonstrates the use of HtmlUnit to check that all the links on a page are not broken. I could have written this using a raw parser and DOM, but it would have been somewhat more complex. In particular, methods such as getAnchors to find all the a elements in a page are very helpful.

Listing 2.3: An HtmlUnit Test for a Page’s Links

import java.io.IOException;
import java.net.*;
import java.util.*;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.*;

import junit.framework.TestCase;

public class LinkCheckTest extends TestCase {

    public void testBlogIndex() throws FailingHttpStatusCodeException, IOException {
        WebClient webClient = new WebClient();
        URL url = new URL("http://www.elharo.com/blog/");
        HtmlPage page = (HtmlPage) webClient.getPage(url);
        List links = page.getAnchors();
        Iterator iterator = links.iterator();
        while (iterator.hasNext()) {
          HtmlAnchor link = (HtmlAnchor) iterator.next();
          URL u = new URL(link.getHrefAttribute());
          // Check that we can download this page.
          // If we can't, getPage throws an exception and
          // the test fails.

This test is more than a unit test. It checks all the links on a page, whereas a real unit test would check only one. Furthermore, it makes connections to external servers. That’s very unusual for a unit test. Still, this is a good test to have, and it will let us know that we need to fix our pages if an external site breaks links by reorganizing its pages.


HttpUnit (http://httpunit.sourceforge.net/) is another open source JUnit extension designed to test HTML pages. It is also best suited for Java programmers already using JUnit for test-driven development, and is in many ways quite similar to HtmlUnit. Some programmers prefer HttpUnit, and others prefer HtmlUnit. If there’s a difference between the two it’s that HttpUnit is somewhat lower-level. It tends to focus more on the raw HTTP connection whereas HtmlUnit more closely imitates a browser. HtmlUnit has somewhat better support for JavaScript, if that’s a concern. However, there’s certainly a lot of overlap between the two projects.

Listing 2.4 demonstrates an HttpUnit test that verifies that a page has exactly one H1 header, and that its text matches the web page’s title. That may not be a requirement for all pages, but it is a requirement for some. For instance, it would be a very apt requirement for a newspaper site.

Listing 2.4: An HttpUnit Test That Matches the Title to a Unique H1 Heading

import java.io.IOException;
import org.xml.sax.SAXException;
import com.meterware.httpunit.*;
import junit.framework.TestCase;

public class TitleChecker extends TestCase {

    public void testFormSubmission() throws IOException, SAXException {
        WebConversation wc = new WebConversation();
        WebResponse wr = wc.getResponse("http://www.elharo.com/blog/");
        HTMLElement[] h1 = wr.getElementsWithName("h1");
        assertEquals(1, h1.length);
        String title = wr.getTitle();
        assertEquals(title, h1[0].getText());


I could have written this test in HtmlUnit too, and I could have written Listing 2.3 with HttpUnit. Which one you use is mostly a matter of personal preference. Of course, these are hardly the only such frameworks. There are several more, including ones not written in Java. Use whichever one you like, but by all means use something.


JWebUnit is a higher-level API that sits on top of HtmlUnit and JUnit. Generally, JWebUnit tests involve more assertions and less straight Java code. These tests are somewhat easier to write without a large amount of Java expertise, and they may be more accessible to a typical web developer. Furthermore, tests can very easily extend over multiple pages as you click links, submit forms, and in general follow an entire path through a web application.

Listing 2.5 demonstrates a JWebUnit test for the search engine on my web site. It fills in the search form on the main page and submits it. Then it checks that one of the expected results is present.

Listing 2.5: A JWebUnit Test for Submitting a Form

import junit.framework.TestCase;
import net.sourceforge.jwebunit.junit.*;

public class LinkChecker extends TestCase {
    private WebTester tester;
    public LinkChecker(String name) {
        tester = new WebTester();

    public void testFormSubmission() {
        // start at this page
        // check that the form we want is on the page
        /// check that the input element we expect is present
        // type something into the input element
        tester.setTextField("s", "Linux");
        // send the form
        // we're now on a different page; check that the
        // text on that page is as expected.
        tester.assertTextPresent("Windows Vista");


FitNesse (http://fitnesse.org/) is a Wiki designed to enable business users to write tests in table format. Business users like spreadsheets. The basic idea of FitNesse is that tests can be written as tables, much like a spreadsheet. Thus, FitNesse tests are not written in Java. Instead, they are written as a table in a Wiki.

You do need a programmer to install and configure FitNesse for your site. However, once it’s running and a few sample fixtures have been written, it is possible for savvy business users to write more tests. FitNesse works best in a pair environment, though, where one programmer and one business user can work together to define the business rules and write tests for them.

For web app acceptance testing, you install Joseph Bergin’s HtmlFixture (http://fitnesse.org/FitNesse.HtmlFixture). It too is based on HtmlUnit. It supplies instructions that are useful for testing web applications such as typing into forms, submitting forms, checking the text on a page, and so forth.

Listing 2.6 demonstrates a simple FitNesse test that checks the http-equiv meta tag in the head to make sure it’s properly specifying UTF-8. The first three lines set the classpath. Then, after a blank line, the next line identifies the type of fixture as an HtmlFixture. (There are several other kinds, but HtmlFixture is the common one for testing web applications.)

The external page at http://www.elharo.com/blog/ is then loaded. In this page, we focus on the element named meta that has an id attribute with the value charset. This will be the subject for our tests.

The test then looks at two attributes of this element. First it inspects the content attribute and asserts that its value is text/html; charset=utf-8. Next it checks the http-equiv attribute of the same element and asserts that its value is content-type.

Listing 2.6: A FitNesse Test for <meta name=”charset” http-equiv=”Content-Type” content=”text/html; charset=UTF-8″ />

!path fitnesse.jar
!path htmlunit-1.5/lib/*.jar
!path htmlfixture20050422.jar

|Element Focus|charset |meta|
|Attribute |content |text/html; charset=utf-8|
|Attribute |http-equiv|content-type|

This test would be embedded in a Wiki page. You can run it from a web browser just by clicking the Test button, as shown in Figure 2.4. If all of the assertions pass, and if nothing else goes wrong, the test will appear green after it is run. Otherwise, it will appear pink. You can use other Wiki markup elsewhere in the page to describe the test.

Figure 2.4: A FitNesse page


Selenium is an open source browser-based test tool designed more for functional and acceptance testing than for unit testing. Unlike with HttpUnit and HtmlUnit, Selenium tests run directly inside the web browser. The page being tested is embedded in an iframe and the Selenium test code is written in JavaScript. It is mostly browser and platform independent, though the IDE for writing tests, shown in Figure 2.5, is limited to Firefox.

Browser window
Figure 2.5: The Selenium IDE

Although you can write tests manually in Selenium using remote control, it is really designed more as a traditional GUI record and playback tool. This makes it more suitable for testing an application that has already been written, and less suitable for doing test-driven development.

Selenium is likely to be more comfortable to front-end developers who are accustomed to working with JavaScript and HTML. It is also likely to be more palatable to professional testers because it’s similar to some of the client GUI testing tools they’re already familiar with.

Listing 2.7 demonstrates a Selenium test that verifies that www.elharo.com shows up in the first page of results from a Google search for “Elliotte”. This script was recorded in the Selenium IDE and then edited a little by hand. You can load it into and then run it from a web browser. Unlike the other examples given here, this is not Java code, and it does not require major programming skills to maintain. Selenium is more of a macro language than a programming language.

Listing 2.7: Test That elharo.com Is in the Top Search Results for Elliotte

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>elharo.com is a top search results for Elliotte</title>
    <table cellpadding="1" cellspacing="1" border="1">
        <tr><td rowspan="1" colspan="3">New Test</td></tr>

Obviously, Listing 2.6 is a real HTML document. You can open this with the Selenium IDE in Firefox and then run the tests. Because the tests run directly inside the web browser, Selenium helps you find bugs that occur in only one browser or another. Given the wide variation in browsers that CSS, HTML, and JavaScript support this capability is very useful. HtmlUnit, HttpUnit, JWebUnit, and the like use their own JavaScript engines which do not always have the same behavior as the browsers’ engines. Selenium uses the browsers themselves, not imitations of them.

The IDE can also export the tests as C#, Java, Perl, Python, or Ruby code so that you can integrate Selenium tests into other environments. This is especially important for test automation. Listing 2.8 shows the same test as in Listing 2.7, but this time in Ruby. However, this will not necessarily catch all the cross-browser bugs you’ll find by running the tests directly in the browser.

Listing 2.8: Automated Test That elharo.com Is in the Top Search Results for Elliotte

require "selenium"
require "test/unit"

class GoogleSearch < Test::Unit::TestCase
def setup
@verification_errors = []
if $selenium
@selenium = $selenium
@selenium = Selenium::SeleneseInterpreter.new("localhost",
4444, *firefox", "http://localhost:4444", 10000);
@selenium.set_context("test_google_search", "info")
def teardown
@selenium.stop unless $selenium
assert_equal [], @verification_errors
def test_google_search
@selenium.open "/"
@selenium.type "q", "elliotte"
@selenium.click "btnG"
@selenium.wait_for_page_to_load "30000"
assert @selenium.is_text_present("www.elharo.com/")
rescue Test::Unit::AssertionFailedError
@verification_errors << $!

Getting Started with Tests

Because you’re refactoring, you already have a web site or application; and if it’s like most I’ve seen, it has limited, if any, front-end tests. Don’t let that discourage you. Pick the tool you like and start to write a few tests for some basic functionality. Any tests at all are better than none. At the early stages, testing is linear. Every test you write makes a noticeable improvement in your code coverage and quality. Don’t get bogged down thinking you have to test everything. That’s great if you can do it; but if you can’t, you can still do something.

Before refactoring a particular page, subdirectory, or path through a site, take an hour and write at least two or three tests for that section. If nothing else, these are smoke tests that will let you know if you totally muck up everything. You can expand on these later when you have time.

If you find a bug, by all means write a test for the bug before fixing it. That will help you know when you’ve fixed the bug, and it will prevent the bug from accidentally reoccurring in the future after other changes. Because front-end tests aren’t very unitary, it’s likely that this test will indirectly test other things besides the specific bit of buggy code.

Finally, for new features and new developments beyond refactoring, by all means write your tests first. This will guarantee that the new parts of the site are tested, and tests will often leak over into the older pages and scripts as well.

Automatic testing is critical to developing a robust, scalable application. Developing a test suite can seem daunting at first, but it’s worth doing. The first test is the hardest. Once you’ve set up your test framework and written your first test, subsequent tests will flow much more easily. Just as you can improve a site linearly through small, automatic refactorings that add up over time, so too can you improve your test suite by adding just a few tests a week. Sooner than you know, you’ll have a solid test suite that helps to ensure reliability by telling you when things are broken and showing you what to fix.

2 Responses to “Testing”

  1. C. Doley Says:

    I don’t think TDD works well for web development. In my experience, web sites developed using HtmlUnit or similar tools tend to be of much _lower_ quality than sites developed without any unit tests on the front end at all. Which of course opens up the glaring question of why.

    The fact is that TDD takes resources away from other things. In cases where you’re writing underlying libraries this is almost always a good tradeoff, in that the robustness of the back-end saves countless hours of debugging in the layers built on top of it. In web development, this does not apply.

    The typical web development process works like this:

    Gather requirements.
    Have artists design the pages.
    Convert to HTML and hook into back-office systems.
    Refine the requirements.
    Tweak the designs.
    Get new artists.

    In some sense this is like all other software development. But the important distinction here is that the cost of changes to web pages is very small compared with changes to shrink-wrapped or back-office software. With the web it is possible to have extremely short release cycles and frequent design changes. In fact, the general public has come to expect their web sites to be continually refined to make their experience more gratifying.

    The problem with trying to apply TDD to this environment is that it reduces flexibility, and thus makes the web site seem more clunky and less tailored to people’s needs. This is certainly true on the ones I’ve been a part of, but it’s also pretty obvious when you come across it (anyone ever try QuickBooks Online?)

  2. ainscough.net » HTTP and HTML testing Says:

    […] article on HTTP and HTML testing using IT test […]