Go Ahead. Break the Build!
There’s a philosophy in extreme programing circles that one should never break the build. As soon as the build is broken, everything stops until it can be fixed again.1 Some teams even hand out “dunce caps” to a programmer who breaks the build.
If by “build” you simply mean the compile-link-package cycle, then I tend to agree. Breaking the build is pretty serious. One of the advantages of extreme programming is that the increments of work are so small that you don’t get very far before discovering you’ve broken the build, so it’s relatively easy to fix. Integration is almost automatic rather than a painful, months long process. Avoiding coupling is also important to keep build times manageable.
However some systems define the build a little more broadly. They consider the build to include successful execution of all the unit tests. For example, Maven gives up if any unit test fails.* It will not create any subsequent targets such as a JAR file, if it can’t run unit tests. Ant doesn’t require this, but does allow this. All that’s necessary is declaring that the jar or zip target depends on the test target.
It’s not just open source either. Over at closed source vendor Atlassian, Charles Miller tells us, “all our tools are predicated on tests that start green and stay green. ” In fact, a failing test is so damaging to them, that he actually advocates writing tests that pass if the bug isn’t fixed and fails if it is. That’s a recipe for disaster if I ever heard one. Five years down the line some new programmer is going to finally fix the line of code that causes the bug, and then carefully reintroduce the bug to get back to the green bar.
This is where I part company from the most extreme of the extremists. If building includes passing all unit tests, then it is often acceptable and even desirable to break the build.
There are several reasons you may choose to introduce a failing test, thus breaking the build; but not fix it immediately. First of all, the person who writes the failing test may not be the best qualified to fix it. For instance, the person who finds the bug and writes the test could be a tester rather than a programmer; or they could be a programmer who’s working on a different piece of the code. Sure, they can fix the bug if they happen to see how to; but if they don’t, it’s still valuable to commit the unit test that proves the bug’s existence.
Even if the person who finds the bug is responsible for fixing it, the fix may not be apparent. Some bugs require a lot of work. Some bugs require only a little work, but may still require a night to sleep on it before the fix becomes obvious. Sometimes it’s a 30 minute fix, but the programmer only has 15 minutes before they have to pick up their daughter from football practice. For whatever reason, not every bug can be fixed immediately. It’s still better to commit the failing unit test so it doesn’t get forgotten in the future, even if that “breaks” the build.
Sometimes failing tests are beyond programmer control. For instance, XOM has one unit test that passes on Windows and Linux, but fails on the Mac. This particular test runs across a bug in Apple’s VM involving files whose name contain Unicode characters from outside the Basic Multilingual Plane. There’s not a lot I can do to fix that; but I shouldn’t pretend it’s not a problem either.
All too often projects are in denial about their bugs. Rather than accept clear evidence of a bug, they do anything they can to deny it. Refusing to allow the build to break, and then defining a test failure as build breakage, is just one example of this pathology. Recognizing, identifying, and reproducing a bug with a well-defined unit test is a valuable contribution. It should be accepted gladly, not met with hostility and an insistence that the reporter immediately provide a patch.
1 Joel Spolsky: “If a daily build is broken, you run the risk of stopping the whole team. Stop everything and keep rebuilding until it’s fixed. Some days, you may have multiple daily builds.”
2 You can tell Maven to exclude specific failing tests from your project.xml by listing the failing tests in an excludes
element like so:
<unitTest>
<includes>
<include>**/*Test.java</include>
</includes>
<excludes>
<!-- currently broken -->
<exclude>org/jaxen/test/JDOMXPathTest.java</exclude>
</excludes>
</unitTest>
You can also tell Maven not to run the unit tests using the command-line options maven.test.skip=true
to skip the unit tests or maven.test.failure.ignore=true
to run the tests but not stop if a test fails. However these are just hacks to get around what Maven really wants to do: run all the unit tests and fail the build if any fail.
January 10th, 2007 at 10:37 am
Most build systems have a binary pass/fail determination. So if you have a unit test that is expected to fail, and continues to fail, doesn’t that hide any other failures from you?
Do you visually inspect the output of each automated build to determine that the failure is the “expected” failure and not due to a compilation error or some other unexpected unit test failure? This seems a little onerous to me.
Maybe it would be nice to introduce the concept of “expected failures” at the unit test runner level. That is, you can write a filing unit test and tell jUnit that a specific test method is expected to fail and thus don’t fail the build because of it. This is not quite the same as ignoring an entire unit test class. Maybe this is kind of like what the Atlassian people are talking about, except you wouldn’t have to convolute the unit test to hide the failure.
Does anyone know if TestNG has additional features for dealing with expected unit test breakage?
January 10th, 2007 at 11:06 am
You really need to watch your phrasing. Telling Maven to skip tests or ignore failures aren’t “just hacks to get around what Maven really wants to do”… they are built-in flags to alow someone the option to change their build strategy as seen fit. A “hack” wouldn’t be setting a built-in property “maven.test.skip=true” – a hack would be manually running every goal except for the “test” ones because there were no built-in behavior.
January 10th, 2007 at 12:29 pm
This is a specific example of a more general issue. Every programming and testing technique has to be applied using human judgement, to decide if it is applicable in a specific case. The problems you’re pointing out come about when people decide to discard human judgement and instead blindly follow an inflexible rule. That’s when you start to see an insistance that processes be followed even when they harm rather than help you in your work. The answer is to keep in mind that these processes are there to achieve some goal, and that there should be some way to bypass the process when it gets in the way of that goal.
January 10th, 2007 at 12:46 pm
If you can write a unit test that exposes a problem that can’t be fixed for months then you have more serious problems than a broken build.
If you write a unit test that exposes a problem that you can’t fix until tomorrow then I don’t see anything wrong with not committing the unit test until you fix the problem.
January 10th, 2007 at 1:16 pm
I think that there’s another way to communicate breakeage that doesn’t risk people getting complacent over passing all the unit tests. If you write a unit test that breaks the build and you can’t fix it, then comment it out.. and write a bug assigning the bug to the correct person. This way the build stays clean and the bug is registered.
If you start using your unit tests as a bug tracking system soon you’ll have way too many broken unit tests.. or somebody’s crazy unit test becomes higher priority than important work because it “breaks the build”.
January 10th, 2007 at 2:29 pm
The Python unit testing framework twisted.trial does have the ability to mark a test with “TODO”, and if it fails it will be recognized as an “expected failure”, and will not keep the bar from being green. If a TODO test passes, it will be reported as an “unexpected success”, at which point you should probably remove the TODO designation.
January 11th, 2007 at 6:03 am
“Five years down the line some new programmer is going to finally fix the line of code that causes the bug, and then carefully reintroduce the bug to get back to the green bar.”
Sometimes the solution to a problem like this, a developer who will ‘fix’ something without even bothering to look at the test or its associated comment, is not to change your process. It’s just not to give commit rights to muppets.
You’re much more likely to confuse a new developer who, on checking some piece of code in, doesn’t realise that although 27 tests are still failing, they’re no longer the same 27 tests that are _expected_ to fail.
Automated test suites work best when they give a binary result. Green or red. Either those things you expect to happen happened, or they didn’t. Either something about the build needs extra attention, or it doesn’t. If you have to examine the results beyond that binary level to make sure that the tests that failed were the ones you expected, and that they’re still failing for the same reasons they did before, you’re introducing inefficiency.
Having your tests represent what you expect to happen, even if the thing you expect to happen isn’t currently the _correct_ thing, seemed to be an idea worth throwing over the wall. That way your tests only ask for attention when they need attention.
(The Python way of TODO-ing tests that are expected not to work is neat, except it doesn’t cover the “how do you know the tests are still testing what you expect them to, and not just failing because they’ve turned into dead code?” case.)
January 11th, 2007 at 9:47 am
Perhaps the simplest thing to do is to split the tests into regression tests (these should always pass) and bug tests (all these should always fail).
When you fix a bug and its test passes, move it from bug tests to regression tests, keeping the regressions green and the bugs red.
A humourous way to deal with this problem:
“Sometimes the solution to a problem like this, a developer who will ‘fix’ something without even bothering to look at the test or its associated comment, is not to change your process. It’s just not to give commit rights to muppets.”
..is to write two tests. One that passes if the bug test fails, and vice-versa.
Then, said muppet will notice that he broke one test while fixing another, and be forced to read or contact his supervising muppet.
January 11th, 2007 at 12:37 pm
In certain unit test frameworks, you can mark a test as “to be skipped.”
So I’d say the obvious thing to do is, write the unit test that exposes the bug, but marked as skipped. Then make an entry in your bug tracker about the bug and in it, mention the test.
This accomplishes everything: communicating that there’s a bug, keeping the build un-broken, not forcing the team to have to keep track of tests that are “expected” to fail, and having a unit test that exposes the bug as well as being useful to whoever decides to tackle the bug.
End of argument, bye.
January 11th, 2007 at 5:00 pm
Shouldn’t long standing and complicated bugs get their own developer branch?
I’m not sure why the developer should have to commit right before he heads out for the day, I mean, that’s what sticking a post-it note on your monitor for the next morning is for (“Fix 30 min bug from yesterday in file so and so”)
Don’t know about your multiplatform bug though, I do *not* know how to handle that.
January 17th, 2007 at 4:31 pm
The problem could also be solved by distinguishing between a staging area (prospective commit) and an actual build (commit). I’ve worked on a lot of projects over many years, and introducing an official staging area was really a big help in a lot of ways.
By the way, developers who break the staging area builds get to clean up their own messes, too.
January 23rd, 2007 at 10:29 pm
I used to own a car that had an oil leak– it was a ridiculous EPA nightmare. When the roads were slick due to rain, there would be a visible oil slick following my car. I used to add a quart of oil to the engine every other day. After some time, I got lazy and wouldn’t even check the oil level before I added a quart. Later, I got even lazier and would forget to add oil for days– then one day while I was driving to work my engine block fused into a solid hunk of very hot metal. Amazingly enough, it didn’t blow up and kill me.
I had a failing unit test but I got lazy and stopped caring about it. It cost me ten grand to buy a new car (which I had to buy the next day so I could go to work and get paid). If I had fixed the issue when I noticed the leak (it was a gasket that needed replacing of all things) it would have maybe cost me a few hundred bucks.
I agree in principle to your point, but in practice it has the potential to lead to bad things (especially if people like me are watching over things!).
January 24th, 2007 at 12:31 am
Tim,
Yes, it’s very easy to achieve with TestNG: you can put your test methods in a particular group (I use “broken”) and always exclude this group from your test runs. After each run, TestNG generates several HTML reports, one of which lists all the methods in each group, so it’s trivial to find out which test methods are currently being skipped and make sure they are removed from the “broken” group before you ship.
—
Cedric
January 24th, 2007 at 12:37 am
Danno,
Groups are also a good solution for running tests only for certain platforms: you can put methods in a group “win32”, “macos”, “linux” or any combination thereof, and then only run these groups on the given platforms.
January 26th, 2007 at 9:26 pm
[…] Go Ahead. Break the Build!- I agree in principle, but in practice it has dangers, man. […]
January 30th, 2007 at 7:07 am
[…] Go… […]
February 3rd, 2007 at 12:09 pm
Taking potshots at Charle’s Miller is unconvincing given Atlassian are making some of the best Java payware out there.
“The solution, perhaps, is the anti-test. Write a test that verifies the bug.”
Sounds like a plan to me, but making them go green is chromatic overloading. I would use another color for tests that verify a defect- black or purple maybe. Hey, can that feature ship for Bamboo 1.0!?
May 26th, 2007 at 11:17 pm
[…] Go Ahead. Break the Build! (tags: testing development ci maven junit agile by:elliotte_rusty_harold) […]
August 6th, 2007 at 9:41 am
I agree.
Using TestNG groups is a great way of working around the build problems mentioned. It also gives you a chance to run only the
tests that are in the broken group so you can see your todo list in front of you ๐
Allowing checkins of tests that fail is a must when doing Test driven development.
September 25th, 2007 at 9:03 am
The problem could also be solved by distinguishing between a staging area (prospective commit) and an actual build.