Refactoring HTML: Chapter 1

Over the next week or two I’m going to serialize the first two chapters Refactoring HTML here. Of course, if you don’t want to wait you can always buy it from Amazon or read it online on Safari. And with that brief commercial announcement out of the way, let us begin:

Refactoring. What is it? Why do it?

In brief, refactoring is the gradual improvement of a code base by making small changes that don’t modify a program’s behavior, usually with the help of some kind of automated tool. The goal of refactoring is to remove the accumulated cruft of years of legacy code and produce cleaner code that is easier to maintain, easier to debug, and easier to add new features to.

Technically, refactoring never actually fixes a bug or adds a feature. However, in practice, when refactoring I almost always uncover bugs that need to be fixed and spot opportunities for new features. Often, refactoring changes difficult problems into tractable and even easy ones. Reorganizing code is the first step in improving it.

If you have the sort of personality that makes you begin a new semester, project, or job by thoroughly cleaning up your workspace, desk, or office, you’ll get this immediately. Refactoring helps you keep the old from getting in the way of the new. It doesn’t let you start from a blank page. Instead, it leaves you with a clean, organized workspace where you can find everything you need, and from which you can move forward.

The concept of refactoring originally came from the object-oriented programming community, and dates back at least as far as 1990 (William F. Opdyke and Ralph E. Johnson, “Refactoring: An Aid in Designing Application Frameworks and Evolving Object-Oriented Systems,” Proceedings of the Symposium on Object-Oriented Programming Emphasizing Practical Applications [SOOPPA], September 1990, ACM), though likely it was in at least limited use before then. However, the term was popularized by Martin Fowler in 1999 in his book Refactoring (Addison-Wesley, 1999). Since then, numerous IDEs and other tools such as Eclipse, IntelliJ IDEA, and C# Refactory have implemented many of his catalogs of refactorings for languages such as Java and C#, as well as inventing many new ones.

However, it’s not just object-oriented code and object-oriented languages that develop cruft and need to be refactored. In fact, it’s not just programming languages at all. Almost any sufficiently complex system that is developed and maintained over time can benefit from refactoring. The reason is twofold.

  1. Increased knowledge of both the system and the problem domain often reveals details that weren’t apparent to the initial designers. No one ever gets everything right in the first release. You have to see a system in production for a while before some of the problems become apparent.

  2. 2. Over time, functionality increases and new code is written to support this functionality. Even if the original system solved its problem perfectly, the new code written to support new features doesn’t mesh perfectly with the old code. Eventually, you reach a point where the old code base simply cannot support the weight of all the new features you want to add.

When you find yourself with a system that is no longer able to support further developments, you have two choices: You can throw it out and build a new system from scratch, or you can shore up the foundations. In practice, we rarely have the time or budget to create a completely new system just to replace something that already works. It is much more cost-effective to add the struts and supports that the existing system needs before further work. If we can slip these supports in gradually, one at a time, rather than as a big-bang integration, so much the better.

Many sufficiently complex systems with large chunks of code are not object-oriented languages and perhaps are not even programming languages at all. For instance, Scott Ambler and Pramod Sadalage demonstrated how to refactor the SQL databases that support many large applications in Refactoring Databases (Addison-Wesley, 2006).

However, while the back end of a large networked application is often a relational database, the front end is a web site. Thin client GUIs delivered in Firefox or Internet Explorer are everywhere, replacing thick client GUIs for all sorts of business applications, such as payroll and lead tracking. Adventurous users at companies such as Sun and Google are going even further and replacing classic desktop applications like word processors and spreadsheets with web apps built out of HTML, CSS, and JavaScript. Finally, the Web and the ubiquity of the web browser have enabled completely new kinds of applications that never existed before, such as eBay, Netflix, PayPal, Google Reader, and Google Maps.

HTML made these applications possible, and it made them faster to develop, but it didn’t make them easy. It didn’t make them simple. It certainly didn’t make them less fundamentally complex. Some of these systems are now on their second, third, or fourth generation; and wouldn’t you know it? Just like any other sufficiently complex, sufficiently long-lived application, these web apps are developing cruft. The new pieces aren’t merging perfectly with the old pieces. Systems are slowing down because the whole structure is just too ungainly. Security is being breached when hackers slip in through the cracks where the new parts meet the old parts. Once again, the choice comes down to throwing out the original application and starting over, or fixing the foundations; but really, there’s no choice. In today’s fast-moving world, nobody can afford to wait for a completely new replacement. The only realistic option is to refactor.

Continued tomorrow…

Comments are closed.