Harold’s Corollary to Knuth’s Law

Lately I’ve found myself arguing about the proper design of unit tests. On my side I’m claiming:

  1. Unit tests should only touch the public API.
  2. Code coverage should be as near 100% as possible.
  3. It’s better to test the real thing than mock objects.

The goal is to make sure that the tests are as close to actual usage as possible. This means that problems are more likely to be detected and false positives are less likely. Furthermore, the discipline of testing through the public API when attempting to achieve 100% code coverage tends to reveal a lot about how the code really works. It routinely highlights dead code that can be eliminated. It reveal paths of optimization. It teaches me things about my own code I didn’t know. It shows patterns in the entire system that makes up my product.

By contrast some programmers advocate that tests should be method-limited. Each test should call the method as directly as possible, perhaps even making it public or non-private and violating encapsulation to enable this. Any external resources that are necessary to run the method such as databases or web servers should be mocked out. At the extreme, even other classes a test touches should be replaced by mock implementations.

This approach may sometimes let the tests be written faster; but not always. There’s a non-trivial cost to designing mock objects to replace the real thing; and sometimes that takes longer. This approach will still tend to find most bugs in the method being tested. However it stops there. It will not find code in the method that should be eliminated because it’s unreachable from the public API. Thus code tested with this approach is likely to be larger, more complex, and slower since it has to handle conditions that can’t happen through the public API. More importantly, such a test starts and stops with that one method. It reveals nothing about the interaction of the different parts of the system. It teaches nothing about how the code really operates in the more complex environment of the full system. It misses bugs that can emerge out of the mixture of multiple different methods and classes even when each method is behaving correctly in isolation according to its spec. That is, it often fails to find flaws in the specifications of the individual methods. Why then are so many programmers so adamant about breaking access protection and every other rule of good design as soon as they start testing?

Would you believe performance?

For instance consider this proposal from Michael Feathers:

A test is not a unit test if:

  • It talks to the database
  • It communicates across the network
  • It touches the file system
  • It can’t run at the same time as any of your other unit tests
  • You have to do special things to your environment (such as editing
    config files) to run it.

Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.

More than 30 years ago Donald Knuth first published what would come to be called Knuth’s law: “premature optimization is the root of all evil in programming.” (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.) But some developers still haven’t gotten the message.

Are there some tests that are so slow they contribute to not running the test suite? Yes. We’ve all seen them, but there’s no way to tell which tests they are in advance. In my test suite for XOM, I have numerous tests that communicate across the network, touch the filesystem, and access third party libraries. However, almost all these tests run like a bat out of hell, and take no noticeable time. The slowest test in the suite? It’s one that operates completely in memory on byte array streams with no network access, does not touch the file system, uses no APIs beyond what’s in Java 1.2 and XOM itself, and there’s no database anywhere in sight. I do omit that test from my standard suite because it takes too long to run. I’ll run it explicitly once or twice before releasing a new version, but not every time I make a change.

I am now proposing Harold’s corollary to Knuth’s law: premature optimization is the root of all evil in testing. It is absolutely essential to make sure that your test suite runs fast enough to run after every change to the code and before every check in. I’m even willing to put a number on “fast enough”, and that number is 90 seconds. However, you simply cannot tell which tests are likely to be too slow to run routinely in advance of actual measurement. Castrating and contorting your tests to fit some imagined idea of what will and will not be slow limits their usefulness.

Tests should be designed for the ideal scenario: a computer that is infinitely fast with infinite memory and a network with zero latency and infinite bandwidth. Of course, that ideal computer doesn’t exist; and you’ll have to profile, optimize, and as a last resort cut back on your tests. However, I’ve never yet met a programmer who could reliably tell which tests (or other code) would and would not be fast enough in advance of actual measurements. Blanket rules that unit tests should not do X or talk to Y because it’s likely to be slow needlessly limits what we can learn from unit tests.

24 Responses to “Harold’s Corollary to Knuth’s Law”

  1. Adam Constabaris Says:

    I think you might be talking past Feathers on this point a bit. Notice that, as you quote him, Feathers list of necessary conditions is supposed to help define unit tests. A unit test, intuitively, tests a unit of code. Unit testiness is not all-or-nothing, it’s a sliding scale that deals with smaller and smaller logical chunks of code. On this view, at the limit, a unit test tests a single method. Striving toward very small unit tests is appealing (in the abstract) because you go from test failures telling you “something in this area is wrong” to having them tell you “something in this method is wrong.” As a side benefit, and of course only for the most part, unit tests so defined can be expected to run pretty quickly, although of course there’s no guarantee. I don’t have a copy of Feathers’ book right here, but I’m pretty sure he also talks up fault isolation as a key benefit. That, as you point out, comes with its own set of problems, but he’s not advocating that the only sort of tests you should have are “unit tests.”

  2. John Cowan Says:

    Umm, is “in testing” actually part of the Knuth quotation? I never heard that before.

  3. Elliotte Rusty Harold Says:

    Oops. Fixed.

  4. Michael Feathers Says:

    Yes, Adam has it right. The “rules” were really a nice way of summing up the things that I’ve noticed (across over nearly a hundred teams) that tend to make tests slower. They definitely aren’t the only things, it’s just that I haven’t seen the compute-bound case very often.

    The most typical way to end up with a ton of sluggish unit tests is to do end to end testing against a database or network resource. In addition to being slow, they also tend to require much more in setup and fail for reasons other than the logic that should be the focus of unit testing.

    And, FWIW, I don’t advocate violating encapsulation unless your back is against the wall and there is no other way to write tests focused on the area you’d like to change. The nice thing about TDD or “writing small focused tests as you go” is that you encapsulate differently and better. Ideally, logic should be separate from external resource usage.

    As a side note, this seems to be a theme among functional programmers also (see Haskell and monadic IO). Given the growing interest in functional idioms and language features, I suspect we’ll see more attention given to the external resource issue.

  5. Elliotte Rusty Harold Says:

    TDD certainly encourage you to encapsulate differently, but I can’t agree that it helps you encapsulate better. Quite the opposite in fact. I routinely see programmers adding setter methods, making classes mutable, publicizing fields, and exposing the internal implementation of their classes so they can test them more directly. This is not an improvement. TDD is too often used as an excuse for violating sound object-oriented design principles.

    The public interface to a class should be designed to be appropriate for the problem domain. It should be as large as it needs to be to solve the actual problem, and no larger. Only then should one consider how to test it. Nothing should be added to a class solely for the purpose of testing. The tests exist to support the model, not the other way around. Do not let the testing tail wag the design dog.

  6. Michael Feathers Says:

    Elliotte, fair enough on adding setters, and making fields public when TDDing greenfield. I don’t encourage that at all. And, all of those things are really seeking shortcuts rather than using testability as a gauge of design.

    The thing that I do encourage is a rethink at a deeper level. It’s not uncommon to TDD a class and notice that some set of private methods are becoming uncomfortably complex.. complex in the sense that you can test them through the public interface but you’d much rather test them directly.

    What is interesting to me is just how often this correlates to “complex enough to be its own class.” It’s easy to munge responsibilities together in code, and testing is an excellent probe. I guess, at the core, If it’s hard to write a test for something, it’s hard to claim that you understand it. Putting values in and knowing what the values should be coming back is where the rubber hits the pavement. If we notice the problem and split those private methods (and they data that the operate upon) into another class, it can be a win in terms of cohesion, coupling, and understandability.

  7. Ravi Venkataraman Says:

    I agree wholeheartedly with ERH on this topic. I have never understood the fascination with unit tests. or mocks, or TDD.

    Unit tests only go so far, and in no way “prove” that the requirements will be met. Mocks are an abomination, requiring far too much effort to build. I prefer to test against the “real” environment, for exactly the same reasons that ERH stated – it is more realistic and is much more likely to flesh out actual problems. initially, I may use stubs that return “expected” values and then move on to the “real” environment.

    As far as TDD goes, it is simply the wrong way to design software, elegant algorithms seldom arise from it. And TDD is never needed if we are familiar with the problem. It will lead to all sorts of bad design. If, as Michael points out, large private methods are a code smell, all I can ask is – but aren’t these methods the result of doing TDD? If so, what value does TDD have as a design tool if it often leads to bad code? Wouldn’t the average programmer be better off writing the correct code without TDD?

  8. Harald K. Says:

    It’s not hard to agree with items 1 and 2, to me this is obvious (just a short warning, I’m not a hard-core TDD’er myself, as I seldom start writing tests before I have some of the API ready). But I think ERH misses the point of mock objects.

    The main reason I test with mocks is not performance or optimization, but predictability (also in the time it takes to run the test, but that’s just a minor part of it).

    My mocks don’t start failing the 42nd time I run the test, because the network was down, some db table was corrupted, or a disk was full. They just run. I also make my mocks fail predictably, so that I can test that the API handles these failure conditions gracefully. Such conditions are really hard to test using the “real thing”. How do you make sure you gracefully handle a socket timeout, an I/O-exception in the middle of a stream etc, using the “real thing”? In additions, mocks can have expectations on them, that can be used to verify the API does what it’s supposed to. If you test with the “real thing” do you build the verification into your implementation?

    Another thing to point out, is that even if integration tests and unit test are different beasts, it’s never one or the other. To create quality software you need both.

  9. Gareth Rees Says:

    Are there some tests that are so slow they contribute to not running the test suite? Yes. We’ve all seen them, but there’s no way to tell which tests they are in advance

    Combinatorial test generators can generate an infinite number of different tests. See for example, Miller, Fredriksen and So (1990), “An Empirical Study of the Reliability of UNIX Utilities” or Eide and Regehr (2008), “Volatiles Are Miscompiled, and What to Do about It”.

    So the issue of “how many tests is it worth running?” arises immediately.

    The development strategy of having just one test suite that you run on every build doesn’t scale to large projects or large numbers of tests. You need some kind of hierarchy of tests: a basic set that you run before every commit; a larger set that you run regularly, and even larger sets that you run rarely, for example in the run-up to public releases.

  10. Alex Martelli Says:

    100% agree with Harald K, and raising it — the parts that are *most* important to tests are the failure cases — are they all handled correctly, etc, etc. I’ve spent the last few years of my life mostly doing cluster management software, and those parts are the real reason for having such SW in the first place — what happens if the database goes down at the same time as the sensors are telling you that temperatures are rising dangerously fast? What happens if both the first-line and backup alerts to the pagers are getting ignored in a critical situation? Etc, etc — people who don’t GET mocks are apparently saying I should actually be setting datacenters on fire, paging operators all over the place, and sabotaging the servers on which the database run, to really test the most important parts of my software… *PAH*!

    BUT, ensuring that failures are handled perfectly is just as crucial on the other end of the stack — cfr 37signals’ book on “Defensive Design for the Web” for a somewhat-rambling but overall useful tome on that. Take care of failures, problems, and all sorts of exceptional conditions that you can only really test via mocks, and the normal “mainstream” flow will take care of itself, or, darn near;-).

    Alex

  11. Brendan Johnston Says:

    We should not assume without proof that the best set of tests are “unit tests”. Elliot is communicating that he has been more effective writing tests which orchestrate more than one unit.

    As Elliot and Harald K suggest both performance and robustness should guide the selection of tests in the suite. These should be real values, not potential performance and robustness.

    Ravi, TDD has not promoted long private methods when I used it.

    Michael, permission for more thoughtful exploration of the solution may be part of the value of TDD. Refactoring to some standard of cyclomatic complexity, method length of some other metric may be add as much to the design.

    Harald, I believe there are many examples of quality software produced without either unit or integration tests.

  12. Ravi Venkataraman Says:

    Brendan, I believe it was Michael and ERH who brought up the topic. I merely asked about the validity of a technique that can so easily lead to bad coding habits.

  13. J. B. Rainsberger Says:

    I don’t use test doubles (aka “mocks”) as a test execution optimization any more. They constitute a design tool for me: they allow me to Worry About One Thing At A Time when I test-drive behavior. That they allow me to run 95% of my tests at 250 tests per second is a pleasant side effect. Of course, I went through the phase of “mock expensive external resources” and I believe that remains an excellent Novice (in the Dreyfus sense) rule to follow.

  14. Steve Freeman Says:

    I’m one of the authors of the various Mock Object papers. I agree absolutely with your first two points but, being less polite than J.B., I think you’ve completely missed the point on the third. Your strawman description is a travesty of what TDD with Mocks should be about. It does describe a naive approach that has become too widespread, but raw critical postings like this one have the unfortunate effect of perpetuating the broken ideas.

    I’d even disagree with J.B., in that mocking expensive external resources is absolutely the wrong way to introduce the ideas. The important thing is to think about how object collaborate and what services they need from their neighbours, so Joe Walnes’ Novice rule is “Only mock types you own” — that way one can concentrate on design issues not performance.

    Although some people have practised it forever, TDD is still very early in its adoption curve. Given the proportion of programmers who still struggle with OO concepts, we should be clear about when TDD practices are fundamentally broken and when they suffer from weak skills.

  15. Name required Says:

    > Mocks are an abomination, requiring far too much effort to build

    You’re doing them wrong.

  16. J. B. Rainsberger Says:

    As a counterpoint to your article, Elliotte, Integration Tests are a Scam: http://www.jbrains.ca/category/integration-tests-are-a-scam or http://tinyurl.com/dz2ftu

  17. Keith Ray Says:

    Dave Smith (local to the SF Bay area) gave an experience report about the influence of “end-to-end integrated tests” and “unit tests” on the software being developed.

    End-to-end integrated tests did NOT help improve the design of the code, did NOT help find where the bugs are (they might reveal a previously-undiscovered bug, but don’t tell you where it is), etc.

    Unit tests (actually, “microtests” like those produced in TDD) influenced the design of the code to be more de-coupled, helped localize bugs, etc. And were run more often, catching problems faster.

    (Apologies to Dave if I mis-remembered or mis-represented his presentation several years ago.)

  18. No to premature optimization | Software Development with Linux Says:

    [...] to Kent Beck for sharing  Harold’s Corollary To Knuth’s Law.  Donald Knuth wrote, in Structured Programming with go to Statements : “Premature [...]

  19. WanderMelee Says:

    Carguous Noll (local to the Breenda area) gave an experience report about the influence of “end-to-end integrated tests” and “unit tests” on the software being developed.

    Unit tests (actually, “microtests” like those produced in TDD) did NOT help improve the design of the code, did NOT help find where the bugs are (they might reveal a previously-undiscovered bug, but don’t tell you where it is), etc.

    End-to-end integrated tests influenced the design of the code to be more de-coupled, helped localize bugs, etc. And were run more often, catching problems faster.

    (Apologies to Carguous if I mis-remembered or mis-represented his presentation several years ago.)

    So there.

  20. BrianW Says:

    > Mocks are an abomination, requiring far too much effort to build

    I agree with “Name Required”: you’re doing them wrong. You’re probably either attempting to mock something that is too complex – which should be a smell to you – or you’re using the wrong mocking library.

  21. Yves Daoust Says:

    Mh, I don’t love the second part that says “Code coverage should be as near 100% as possible.” Given the number of inputs and complexity of real-life programs, I have the impression that coverage often looks like 0.0000000001%. I’d prefer something like “Code coverage should be near 100% of what is economically sensible.”

    My favorite advice regarding coverage is: “test first what the user will do first”. It takes some hindsight to figure out what others will be doing with your code, but these are the places where bugs will mostly harm.

    I agree with the other two parts.

  22. Ken Says:

    How about “unit” tests that execute their own SQL queries that bear no relationship to any existing application and distorts the whole intent of the DB?
    I have to admit that calling tests that execute the actual code being used as “unit” is foreign to me. Those seem like “white box” to me.

  23. Ken Says:

    Whoops, my bad. I was thinking of “black box” testing

  24. A Smattering of Selenium #62 « Official Selenium Blog Says:

    [...] from the vaults is Harold’s Corollary to Knuth’s Law which is aimed at Unit not Functional tests, but still is food for [...]

Leave a Reply