Friday, June 6th, 2008

Here’s part 6 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

The separation of presentation from content is one of the fundamental design principles of HTML. Separating presentation from content allows you to serve the same text to different clients and let them decide how to format it in the way that best suits their needs. A cell phone browser doesn’t have the same capabilities as a desktop browser such as Firefox. Indeed, a browser may not display content visually at all. For instance, it may read the document to the user.

Consequently, the HTML should focus on what the document means rather than on what it looks like. Most important, this style of authoring respects users’ preferences. A reader can choose the fonts and colors that suit her rather than relying on the page’s default. One size does not fit all. What is easily readable by a 30-year-old airline pilot with 20/20 vision may be illegible to an 85-year-old grandmother. A beautiful red and green design may be incomprehensible to a colorblind user. And a carefully arranged table layout may be a confusing mishmash of random words to a driver listening to a page on his cell phone while speeding down the Garden State Parkway.

Thus, in HTML, you don’t say that “Why CSS” a few paragraphs up should be formatted in 11-point Arial bold, left-aligned. Instead, you say that it is an H3 header. At least you did, until Netscape came along and invented the font tag and a dozen other new presentational elements which people immediately began to use. The W3C responded with CSS, but the damage had been done. Web pages everywhere were created with a confusing mix of font, frame, marquee, and other presentational elements. Semantic elements such as blockquote, table, img, and ul were subverted to support layout goals. To be honest, this never really worked all that well; but for a long time it was the best we had.

That is no longer true. Today’s CSS enables not just the same, but better layouts and presentations than one can achieve using hacks such as frames, spacer GIFs, and text wrapped up inside images. The CSS layouts are not only prettier; they are leaner, more efficient, and more accessible. They cause pages to load faster and display better. With some effort, they can produce pages that work better in a wide variety of browsers on multiple platforms.


Thursday, June 5th, 2008

Here’s part 5 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

XHTML is simply an XML-ized version of HTML. Whereas HTML is at least theoretically built on top of SGML, XHTML is built on top of XML. XML is a much simpler, clearer spec than SGML. Therefore, XHTML is a simpler, clearer version of HTML. However, like a gun, a lot depends on whether you’re facing its front or rear end.

XHTML makes life harder for document authors in exchange for making life easier for document consumers. Whereas HTML is forgiving, XHTML is not. In HTML, nothing too serious happens if you omit an end-tag or leave off a quote here or there. Some extra text may be marked in boldface or be improperly indented. At worst, a few words here or there may vanish. However, most of the page will still display. This forgiving nature gives HTML a very shallow learning curve. Although you can make mistakes when writing HTML, nothing horrible happens to you if you do.

By contrast, XHTML is much stricter. A trivial mistake such as a missing quote or an omitted end-tag that a browser would silently recover from in HTML becomes a four-alarm, drop-everything, sirens-blaring emergency in XHTML. One little, tiny error in an XHTML document, and the browser will throw up its hands and refuse to display the page, as shown in Figure 1.2. This makes writing XHTML pages harder, especially if you’re using a plain text editor. Like writing a computer program, one syntax error breaks everything. There is no leeway, and no margin for error.

XML Parsing Error: mismatched tag. Expected: </p>. Location: http://www.elharo.com/malformed.xhtml Line Number 10, Column 3:</body> –^

Figure 1.2: Firefox responding to an error in an XHTML page

Why, then, would anybody choose XHTML? Because the same characteristics that make authoring XHTML a challenge (draconian error handling) make consuming XHTML a walk in the park. Work has been shifted from the browser to the author. A web browser (or anything else that reads the page) doesn’t have to try to make sense out of a confusing mess of tag soup and guess what the page really meant to say. If the page is unclear in any way, the browser is allowed, in fact required, to throw up its hands and refuse to process it. This makes the browser’s job much simpler. A large portion of today’s browsers devote a large chunk of their HTML parsing code simply to correcting errors in pages. With XHTML they don’t have to do that.


What to Refactor To

Friday, May 30th, 2008

Here’s part 4 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

There is one critical difference between refactoring in a programming language such as Java and refactoring in a markup language such as HTML. Compared to HTML, Java really hasn’t changed all that much. C++ has changed even less, and C hardly at all. A program written in Java 1.0 still runs pretty much the same as a program written in Java 6. A lot of features have been added to the language, but it has remained more or less the same.

By contrast, HTML and associated technologies have evolved much faster over the same time frame. Today’s HTML is really not the same language as the HTML of 1995. Keywords have been removed. New ones have been added. Syntax and parsing algorithms have changed. Although a modern browser such as Firefox or Internet Explorer 7 can usually make some sense out of an old-fashioned page, you may discover that a lot of things don’t work quite right. Furthermore, entirely new components such as CSS and ECMAScript have been added to the stew that a browser must consume.

When to Refactor

Thursday, May 29th, 2008

Here’s part 3 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

When should you refactor? When do you say the time has come to put new features on hold while you clean up and get a handle on your legacy code? There are several possible answers to this question, and they are not mutually exclusive.

The first time to refactor is before any redesign. If your site is undergoing a major redevelopment, the first thing you need to do is get its content under control. The first benefit to refactoring at this point is simply that you will create a much more stable base on which to implement the new design. A well-formed, well-organized page is much easier to reformat.

Why Refactor HTML?

Wednesday, May 28th, 2008

Here’s part 2 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.

How do you know when it’s time to refactor? What are the smells of bad code that should set your nose to twitching? There are quite a few symptoms, but these are some of the smelliest.

Smell: Illegible Code

The most obvious symptom is that you do a View Source on the page and it might as well be written in Greek (unless, of course, you’re working in Greece). Most coders know ugly code when we see it. Ugly code looks ugly. Which would you rather see, Listing 1.1 or Listing 1.2? I don’t think I have to tell you which is uglier, and which is going to be easier to maintain and update.