Harold’s Corollary to Knuth’s Law

August 5th, 2008

Lately I’ve found myself arguing about the proper design of unit tests. On my side I’m claiming:

  1. Unit tests should only touch the public API.
  2. Code coverage should be as near 100% as possible.
  3. It’s better to test the real thing than mock objects.

The goal is to make sure that the tests are as close to actual usage as possible. This means that problems are more likely to be detected and false positives are less likely. Furthermore, the discipline of testing through the public API when attempting to achieve 100% code coverage tends to reveal a lot about how the code really works. It routinely highlights dead code that can be eliminated. It reveal paths of optimization. It teaches me things about my own code I didn’t know. It shows patterns in the entire system that makes up my product.

By contrast some programmers advocate that tests should be method-limited. Each test should call the method as directly as possible, perhaps even making it public or non-private and violating encapsulation to enable this. Any external resources that are necessary to run the method such as databases or web servers should be mocked out. At the extreme, even other classes a test touches should be replaced by mock implementations.

This approach may sometimes let the tests be written faster; but not always. There’s a non-trivial cost to designing mock objects to replace the real thing; and sometimes that takes longer. This approach will still tend to find most bugs in the method being tested. However it stops there. It will not find code in the method that should be eliminated because it’s unreachable from the public API. Thus code tested with this approach is likely to be larger, more complex, and slower since it has to handle conditions that can’t happen through the public API. More importantly, such a test starts and stops with that one method. It reveals nothing about the interaction of the different parts of the system. It teaches nothing about how the code really operates in the more complex environment of the full system. It misses bugs that can emerge out of the mixture of multiple different methods and classes even when each method is behaving correctly in isolation according to its spec. That is, it often fails to find flaws in the specifications of the individual methods. Why then are so many programmers so adamant about breaking access protection and every other rule of good design as soon as they start testing?

Would you believe performance?
Read the rest of this entry »

10 Things to Know Before You Go To Beijing

August 4th, 2008

1. Learn Mandarin.

Even a little will go a long way. English is very uncommon here. All those tourist phrase books and Berlitz courses that did you absolutely no good in Europe because everyone spoke English? They actually help here. The most important phrase to know is “Boo-yao” which loosely translates as “No, I don’t want that cheap plastic souvenir/guide book/Rolex/Gucci bag you’re trying sell me, and I really mean it.”
Read the rest of this entry »

Upgraded to WordPress 2.6

August 2nd, 2008

I’ve upgraded this site to WordPress 2.6. Please holler if you notice any problems. Thanks.

Chapter 3: Well-formedness

July 14th, 2008

Here’s part 15 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

The very first step in moving markup into modern form is to make it well-formed. Well-formedness is the basis of the huge and incredibly powerful XML tool chain. Well-formedness guarantees a single unique tree structure for the document that can be operated on by the DOM, thus making it the basis of reliable, cross-browser JavaScript. The very first thing you need to do is make your pages well-formed.

Validity, although important, is not nearly as crucial as well-formedness. There are often good reasons to compromise on validity. In fact, I often deliberately publish invalid pages. If I need an element the DTD doesn’t allow, I put it in. It won’t hurt anything because browsers ignore elements they don’t understand. If I have a blockquote that contains raw text but no elements, no great harm is done. If I use an HTML 5 element such as m that Opera recognizes and other browsers don’t, those other browsers will just ignore it. However, if the page is malformed, the consequences are much more severe.

First, I won’t be able to use any XML tools, such as XSLT or SAX, to process the page. Indeed, almost the only thing I can do with it is view it in a browser. It is very hard to do any reliable automated processing or testing with a malformed page.

Second, browser display becomes much more unpredictable. Different browsers fill in the missing pieces and correct the mistakes of malformed pages in different ways. Writing cross-platform JavaScript or CSS is hard enough without worrying about what tree each browser will construct from ambiguous HTML. Making the page well-formed makes it a lot more likely that I can make it behave as I like across a wide range of browsers.
Read the rest of this entry »

XSLT

July 3rd, 2008

Here’s part 14 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

XSLT (Extensible Stylesheet Language Transformations) is one of many XML tools that work well on HTML documents once they have first been converted into well-formed XHTML. In fact, it is one of my favorite such tools, and the first thing I turn to for many tasks. For instance, I use it to automatically generate a lot of content, such as RSS and Atom feeds, by screen-scraping my HTML pages. Indeed, the possibility of using XSLT on my documents is one of my main reasons for refactoring documents into well-formed XHTML. XSLT can query documents for things you need to fix and automate some of the fixes.
Read the rest of this entry »