XSLT (Extensible Stylesheet Language Transformations) is one of many XML tools that work well on HTML documents once they have first been converted into well-formed XHTML. In fact, it is one of my favorite such tools, and the first thing I turn to for many tasks. For instance, I use it to automatically generate a lot of content, such as RSS and Atom feeds, by screen-scraping my HTML pages. Indeed, the possibility of using XSLT on my documents is one of my main reasons for refactoring documents into well-formed XHTML. XSLT can query documents for things you need to fix and automate some of the fixes.
When refactoring XHTML with XSLT, you usually leave more alone than you change. Thus, most refactoring stylesheets start with the identity transformation shown in Listing 2.9.
Listing 2.9: The Identity Transformation in XSLT
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'> xmlns:html='http://www.w3.org/1999/xhtml' xmlns='http://www.w3.org/1999/xhtml' exclude-result-prefixes='html'> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
This merely copies the entire document from the input to the output. You then modify this basic stylesheet with a few extra rules to make the changes you desire. For example, suppose you want to change all the deprecated
<i> elements to
<em> elements. You would add this rule to the stylesheet:
<xsl:template match='html:i'> <em> <xsl:apply-templates select="@*|node()"/> </em> </xsl:template>
Notice that the XPath expression in the match attribute must use a namespace prefix, even though the element it’s matching uses the default namespace. This is a common source of confusion when transforming XHTML documents. You always have to assign the XHTML namespace a prefix when you’re using it in an XPath expression.
Several good introductions to XSLT are available in print and on the Web. First, I’ll recommend two I’ve written myself. Chapter 15 of The XML 1.1 Bible (Wiley, 2003) covers XSLT in depth, and is available on the Web at http://www.cafeconleche.org/books/bible3/chapters/ch15.html. XML in a Nutshell, 3rd Edition, by Elliotte Harold and W. Scott Means (O’Reilly, 2004), provides a somewhat more concise introduction. Finally, if you want the most comprehensive coverage available, I recommend Michael Kay’s XSLT: Programmer’s Reference (Wrox, 2001) and XSLT 2.0: Programmer’s Reference (Wrox, 2004).
This concludes Chapter 2. I’ll probably post a couple more sections from Chapter 3. Then if you want to see what comes next, you’ll have to buy the book.