How to write network backup software: a lesson in practical optimization

July 22nd, 2006

In my Human Factors in API Design presentation at Architecture & Design World this past week, I claimed that classic optimization is rarely necessary. Pulling operations outside of loops or reducing the number of operations in a method rarely has any noticeable effect on performance.* Most real performance problems come from doing too many fundamentally slow operations; for instance, writing to the disk or reading from the network.

For example, you don’t want to open and close a database connection for every operation. Even on a LAN, that can easily hit you with one or two seconds (not milliseconds but seconds) of overhead per call. Do that a few hundred times and suddenly you’ve got an unusably slow application. Instead you need to:

  1. Cache and reuse the database connection(s) rather than constantly opening and closing new connections.
  2. Figure out how to reduce the number of database queries your application makes.

1

Most programmers who write database facing applications already know all this. There are numerous frameworks designed to make this sort of optimization automatically. That’s what a lot of middleware is about. Programmers who work with databases have either learned this lesson or involuntarily changed careers. It’s that important.

However, recently I’ve realized that another field has just as big a problem with network overhead as do database apps. However in this field the lesson does not seem to have been as widely learned. That field is backup software.
Read the rest of this entry »

Must Ignore vs. Microformats

July 12th, 2006

I tend to assume most people know what they’re talking about, especially if they’re talking about something I don’t really understand. Sometimes it takes a really blatant example of just what it is they’re saying before I realize they’re talking out of their posteriors.

For instance, I used to think homeopathy was a vaguely reasonable practice based on traditional herbal medicine. Then one day I was stuck at the pharmacist for fifteen minutes waiting for a prescription. Since I had nothing better to do, I picked up a pamphlet about the principles of homeopathy and started to read. Almost immediately it became clear that there was nothing in the little glass vials except plain water, that there was no possible way any of these “remedies” could do anything except through the placebo effect, and that the whole field was complete and utter bunk.

It’s important to note here that I didn’t read some detailed scientific study about homeopathy. I didn’t read an article in the Skeptical Inquirer debunking homeopathy. I read a really well-written piece by an advocate of homeopathy that explained exactly what homeopathy was and why they thought it worked; and that clear explanation showed me (or anyone with a layperson’s understanding of chemistry) that homeopathy was completely bogus. I have recently had the same experience with microformats.
Read the rest of this entry »

How do you specify an exponentiation function with a test?

July 8th, 2006

While it may be a slightly too extreme position to say that tests are the only spec, I think it is absolutely reasonable to consider tests to be a major part of the spec. Indeed a specification without normative test cases is far less likely to be implemented correctly and interoperably than one with a solid normative test suite. The more exhaustive the test suite is, the easier it is to write a conforming correct implementation.

Cedric Beust presents the question, “how do you specify an exponentiation function with a test?” as a counterexample to tests as specs. Actually I don’t think it’s all that hard. Here’s one example:
Read the rest of this entry »

10 Things I Hate About Ruby

July 3rd, 2006

1. initialize

Read the rest of this entry »

Why Blogs Work

June 16th, 2006

Tuesday night I gave my RSS, Atom, APP, and All That talk to the Amateur Computer Group of New Jersey JUG in Scotch Plains. This is the seventh time I’ve given this particular talk, and I think last night I finally understood something about blogs that had eluded me up till now.

I’ve noticed for a while that blogging really represents a phase change in the Web. It has turned the Web from a read-only medium to a read-write medium. What I couldn’t figure out was why. There’s nothing technically different about using WordPress or Blogger compared to editing HTML and uploading the files to the server. Sure you don’t have to know HTML to blog; but there’ve been HTML editors that look like word processors for 10+ years now, and they didn’t lead to the explosion of content with blogs. FTP’s a bit of a pain for a non-techie, but there’ve been content management systems and editors that use HTTP PUT and/or hide the FTP client. None of them led to the explosion in content we see with blogging.

Nor is it that there’s one service that’s just particualrly well done that has allowed blogging to explode. If so, you’d see something like MySpace; that is, all the blogs on one site or platform instead of the plethora we have today (WordPress, Movable Type, Blogger, etc.).

But there is one thing that all these blog systems (and most others) have in common that none of the editors like DreamWeaver or Content Management Systems support:

User don’t have to pick their URLs.
Read the rest of this entry »