Monday, June 16th, 2008

Here’s part 10 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

There really are standards for HTML, even if nobody follows them. One way to find out whether a site follows HTML standards is to run a page through a validation service. The results can be enlightening. They will provide you with specific details to fix, as well as a good idea of how much work you have ahead of you.

The W3C Markup Validation Service

For public pages, the validator of choice is the W3C’s Markup Validation Service, at Simply enter the URL of the page you wish to check, and see what it tells you. For example, Figure 2.1 shows the result of validating my blog against this service.

Figure 2.1: The W3C Markup Validation Service

This page is not valid XHTML

It seems I had misremembered the syntax of the blockquote element. I had mistyped the cite attribute as the source attribute. This was actually better than I expected. I fixed that and rechecked, as shown in Figure 2.2. Now the page is valid.

Chapter 2: Tools

Thursday, June 12th, 2008

Today we start Chapter 2 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

Automatic tools are a critical component of refactoring. Although you can perform most refactoring manually with a text editor, and although I will sometimes demonstrate refactoring that way for purposes of illustration, in practice we almost always use software to help us. To my knowledge no major refactoring browsers are available for HTML at the time of this writing. However, a lot of tools can assist in many of the processes. In this section, I’ll explain some of them.

Backups, Staging Servers, and Source Code Control

Throughout this book, I’m going to show you some very powerful tools and techniques. As the great educator Stan Lee taught us, “With great power comes great responsibility.” Your responsibility is to not irretrievably break anything while using these techniques. Some of the tools I’ll show can misbehave. Some of them have corner cases where they get confused. A huge amount of bad HTML is out there, not all of which the tools discussed here have accounted for. Consequently, refactoring HTML requires at least a five-step process.

  1. Identify the problem.
  2. Fix the problem.
  3. Verify that the problem has been fixed.
  4. Check that no new problems have been introduced.
  5. Deploy the solution.

Because things can go wrong, you should not use any of these techniques on a live site. Instead, make a local copy of the site before making any changes. After making changes to your local copy, carefully verify all pages once again before you deploy.

Objections to Refactoring

Tuesday, June 10th, 2008

Here’s part 8 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

It is not uncommon for people ranging from the CEO to managers to HTML grunts to object to the concept of refactoring. The concern is expressed in many ways, but it usually amounts to this:

We don’t have the time to waste on cleaning up the code. We have to get this feature implemented now!

There are two possible responses to this comment. The first is that refactoring saves time in the long run. The second is that you have more time than you think you do. Both are true.


Monday, June 9th, 2008

Here’s part 7 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

Representational State Transfer (REST) is the oldest and yet least familiar of the three refactoring goals I present here. Although I’ll mostly focus on HTML in this book, one can’t ignore the protocol by which HTML travels. That protocol is HTTP, and REST is the architecture of HTTP. (To be pedantic, REST is actually the architectural style by which HTTP is designed.)

Understanding HTTP and REST has important consequences for how you design web applications. Anytime you place a form in a page, or use AJAX to send data back and forth to a JavaScript program, you’re using HTTP. Use HTTP correctly and you’ll develop robust, secure, scalable applications. Use it incorrectly and the best you can hope for is a marginally functional system. The worst that can happen, however, is pretty bad: a web spider that deletes your entire site, a shopping center that melts down under heavy traffic during the Christmas shopping season, or a site that search engines can’t index and users can’t find.


Friday, June 6th, 2008

Here’s part 6 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

The separation of presentation from content is one of the fundamental design principles of HTML. Separating presentation from content allows you to serve the same text to different clients and let them decide how to format it in the way that best suits their needs. A cell phone browser doesn’t have the same capabilities as a desktop browser such as Firefox. Indeed, a browser may not display content visually at all. For instance, it may read the document to the user.

Consequently, the HTML should focus on what the document means rather than on what it looks like. Most important, this style of authoring respects users’ preferences. A reader can choose the fonts and colors that suit her rather than relying on the page’s default. One size does not fit all. What is easily readable by a 30-year-old airline pilot with 20/20 vision may be illegible to an 85-year-old grandmother. A beautiful red and green design may be incomprehensible to a colorblind user. And a carefully arranged table layout may be a confusing mishmash of random words to a driver listening to a page on his cell phone while speeding down the Garden State Parkway.

Thus, in HTML, you don’t say that “Why CSS” a few paragraphs up should be formatted in 11-point Arial bold, left-aligned. Instead, you say that it is an H3 header. At least you did, until Netscape came along and invented the font tag and a dozen other new presentational elements which people immediately began to use. The W3C responded with CSS, but the damage had been done. Web pages everywhere were created with a confusing mix of font, frame, marquee, and other presentational elements. Semantic elements such as blockquote, table, img, and ul were subverted to support layout goals. To be honest, this never really worked all that well; but for a long time it was the best we had.

That is no longer true. Today’s CSS enables not just the same, but better layouts and presentations than one can achieve using hacks such as frames, spacer GIFs, and text wrapped up inside images. The CSS layouts are not only prettier; they are leaner, more efficient, and more accessible. They cause pages to load faster and display better. With some effort, they can produce pages that work better in a wide variety of browsers on multiple platforms.