How to write network backup software: a lesson in practical optimization

Saturday, July 22nd, 2006

In my Human Factors in API Design presentation at Architecture & Design World this past week, I claimed that classic optimization is rarely necessary. Pulling operations outside of loops or reducing the number of operations in a method rarely has any noticeable effect on performance.* Most real performance problems come from doing too many fundamentally slow operations; for instance, writing to the disk or reading from the network.

For example, you don’t want to open and close a database connection for every operation. Even on a LAN, that can easily hit you with one or two seconds (not milliseconds but seconds) of overhead per call. Do that a few hundred times and suddenly you’ve got an unusably slow application. Instead you need to:

  1. Cache and reuse the database connection(s) rather than constantly opening and closing new connections.
  2. Figure out how to reduce the number of database queries your application makes.

1

Most programmers who write database facing applications already know all this. There are numerous frameworks designed to make this sort of optimization automatically. That’s what a lot of middleware is about. Programmers who work with databases have either learned this lesson or involuntarily changed careers. It’s that important.

However, recently I’ve realized that another field has just as big a problem with network overhead as do database apps. However in this field the lesson does not seem to have been as widely learned. That field is backup software.
(more…)