Lately I’ve found myself arguing about the proper design of unit tests. On my side I’m claiming:
- Unit tests should only touch the public API.
- Code coverage should be as near 100% as possible.
- It’s better to test the real thing than mock objects.
The goal is to make sure that the tests are as close to actual usage as possible. This means that problems are more likely to be detected and false positives are less likely. Furthermore, the discipline of testing through the public API when attempting to achieve 100% code coverage tends to reveal a lot about how the code really works. It routinely highlights dead code that can be eliminated. It reveal paths of optimization. It teaches me things about my own code I didn’t know. It shows patterns in the entire system that makes up my product.
By contrast some programmers advocate that tests should be method-limited. Each test should call the method as directly as possible, perhaps even making it public or non-private and violating encapsulation to enable this. Any external resources that are necessary to run the method such as databases or web servers should be mocked out. At the extreme, even other classes a test touches should be replaced by mock implementations.
This approach may sometimes let the tests be written faster; but not always. There’s a non-trivial cost to designing mock objects to replace the real thing; and sometimes that takes longer. This approach will still tend to find most bugs in the method being tested. However it stops there. It will not find code in the method that should be eliminated because it’s unreachable from the public API. Thus code tested with this approach is likely to be larger, more complex, and slower since it has to handle conditions that can’t happen through the public API. More importantly, such a test starts and stops with that one method. It reveals nothing about the interaction of the different parts of the system. It teaches nothing about how the code really operates in the more complex environment of the full system. It misses bugs that can emerge out of the mixture of multiple different methods and classes even when each method is behaving correctly in isolation according to its spec. That is, it often fails to find flaws in the specifications of the individual methods. Why then are so many programmers so adamant about breaking access protection and every other rule of good design as soon as they start testing?
Would you believe performance?
For instance consider this proposal from Michael Feathers:
A test is not a unit test if:
- It talks to the database
- It communicates across the network
- It touches the file system
- It can’t run at the same time as any of your other unit tests
- You have to do special things to your environment (such as editing
config files) to run it.
Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.
More than 30 years ago Donald Knuth first published what would come to be called Knuth’s law: “premature optimization is the root of all evil in programming.” (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.) But some developers still haven’t gotten the message.
Are there some tests that are so slow they contribute to not running the test suite? Yes. We’ve all seen them, but there’s no way to tell which tests they are in advance. In my test suite for XOM, I have numerous tests that communicate across the network, touch the filesystem, and access third party libraries. However, almost all these tests run like a bat out of hell, and take no noticeable time. The slowest test in the suite? It’s one that operates completely in memory on byte array streams with no network access, does not touch the file system, uses no APIs beyond what’s in Java 1.2 and XOM itself, and there’s no database anywhere in sight. I do omit that test from my standard suite because it takes too long to run. I’ll run it explicitly once or twice before releasing a new version, but not every time I make a change.
I am now proposing Harold’s corollary to Knuth’s law: premature optimization is the root of all evil in testing. It is absolutely essential to make sure that your test suite runs fast enough to run after every change to the code and before every check in. I’m even willing to put a number on “fast enough”, and that number is 90 seconds. However, you simply cannot tell which tests are likely to be too slow to run routinely in advance of actual measurement. Castrating and contorting your tests to fit some imagined idea of what will and will not be slow limits their usefulness.
Tests should be designed for the ideal scenario: a computer that is infinitely fast with infinite memory and a network with zero latency and infinite bandwidth. Of course, that ideal computer doesn’t exist; and you’ll have to profile, optimize, and as a last resort cut back on your tests. However, I’ve never yet met a programmer who could reliably tell which tests (or other code) would and would not be fast enough in advance of actual measurements. Blanket rules that unit tests should not do X or talk to Y because it’s likely to be slow needlessly limits what we can learn from unit tests.