How to write network backup software: a lesson in practical optimization

Saturday, July 22nd, 2006

In my Human Factors in API Design presentation at Architecture & Design World this past week, I claimed that classic optimization is rarely necessary. Pulling operations outside of loops or reducing the number of operations in a method rarely has any noticeable effect on performance.* Most real performance problems come from doing too many fundamentally slow operations; for instance, writing to the disk or reading from the network.

For example, you don’t want to open and close a database connection for every operation. Even on a LAN, that can easily hit you with one or two seconds (not milliseconds but seconds) of overhead per call. Do that a few hundred times and suddenly you’ve got an unusably slow application. Instead you need to:

  1. Cache and reuse the database connection(s) rather than constantly opening and closing new connections.
  2. Figure out how to reduce the number of database queries your application makes.


Most programmers who write database facing applications already know all this. There are numerous frameworks designed to make this sort of optimization automatically. That’s what a lot of middleware is about. Programmers who work with databases have either learned this lesson or involuntarily changed careers. It’s that important.

However, recently I’ve realized that another field has just as big a problem with network overhead as do database apps. However in this field the lesson does not seem to have been as widely learned. That field is backup software.

Must Ignore vs. Microformats

Wednesday, July 12th, 2006

I tend to assume most people know what they’re talking about, especially if they’re talking about something I don’t really understand. Sometimes it takes a really blatant example of just what it is they’re saying before I realize they’re talking out of their posteriors.

For instance, I used to think homeopathy was a vaguely reasonable practice based on traditional herbal medicine. Then one day I was stuck at the pharmacist for fifteen minutes waiting for a prescription. Since I had nothing better to do, I picked up a pamphlet about the principles of homeopathy and started to read. Almost immediately it became clear that there was nothing in the little glass vials except plain water, that there was no possible way any of these “remedies” could do anything except through the placebo effect, and that the whole field was complete and utter bunk.

It’s important to note here that I didn’t read some detailed scientific study about homeopathy. I didn’t read an article in the Skeptical Inquirer debunking homeopathy. I read a really well-written piece by an advocate of homeopathy that explained exactly what homeopathy was and why they thought it worked; and that clear explanation showed me (or anyone with a layperson’s understanding of chemistry) that homeopathy was completely bogus. I have recently had the same experience with microformats.

How do you specify an exponentiation function with a test?

Saturday, July 8th, 2006

While it may be a slightly too extreme position to say that tests are the only spec, I think it is absolutely reasonable to consider tests to be a major part of the spec. Indeed a specification without normative test cases is far less likely to be implemented correctly and interoperably than one with a solid normative test suite. The more exhaustive the test suite is, the easier it is to write a conforming correct implementation.

Cedric Beust presents the question, “how do you specify an exponentiation function with a test?” as a counterexample to tests as specs. Actually I don’t think it’s all that hard. Here’s one example:

10 Things I Hate About Ruby

Monday, July 3rd, 2006

1. initialize