XSLT
Here’s part 14 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.
XSLT (Extensible Stylesheet Language Transformations) is one of many XML tools that work well on HTML documents once they have first been converted into well-formed XHTML. In fact, it is one of my favorite such tools, and the first thing I turn to for many tasks. For instance, I use it to automatically generate a lot of content, such as RSS and Atom feeds, by screen-scraping my HTML pages. Indeed, the possibility of using XSLT on my documents is one of my main reasons for refactoring documents into well-formed XHTML. XSLT can query documents for things you need to fix and automate some of the fixes.
When refactoring XHTML with XSLT, you usually leave more alone than you change. Thus, most refactoring stylesheets start with the identity transformation shown in Listing 2.9.
Listing 2.9: The Identity Transformation in XSLT
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'>
xmlns:html='http://www.w3.org/1999/xhtml'
xmlns='http://www.w3.org/1999/xhtml'
exclude-result-prefixes='html'>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This merely copies the entire document from the input to the output. You then modify this basic stylesheet with a few extra rules to make the changes you desire. For example, suppose you want to change all the deprecated <i>
elements to <em>
elements. You would add this rule to the stylesheet:
<xsl:template match='html:i'>
<em>
<xsl:apply-templates select="@*|node()"/>
</em>
</xsl:template>
Notice that the XPath expression in the match attribute must use a namespace prefix, even though the element it’s matching uses the default namespace. This is a common source of confusion when transforming XHTML documents. You always have to assign the XHTML namespace a prefix when you’re using it in an XPath expression.
Note
Several good introductions to XSLT are available in print and on the Web. First, I’ll recommend two I’ve written myself. Chapter 15 of The XML 1.1 Bible (Wiley, 2003) covers XSLT in depth, and is available on the Web at http://www.cafeconleche.org/books/bible3/chapters/ch15.html. XML in a Nutshell, 3rd Edition, by Elliotte Harold and W. Scott Means (O’Reilly, 2004), provides a somewhat more concise introduction. Finally, if you want the most comprehensive coverage available, I recommend Michael Kay’s XSLT: Programmer’s Reference (Wrox, 2001) and XSLT 2.0: Programmer’s Reference (Wrox, 2004).
This concludes Chapter 2. I’ll probably post a couple more sections from Chapter 3. Then if you want to see what comes next, you’ll have to buy the book. 🙂
July 3rd, 2008 at 9:19 am
In your example you need to replace your “”s with “<“s and “>”s.
July 3rd, 2008 at 9:21 am
Hmm, that parsed my comment. I meant (fingers crossed it works this time):
In your example you need to replace your “<“s and “>”s with “<”s and “>”s.
July 3rd, 2008 at 10:54 am
I hadn’t seen that trick for making XPath match elements in the default namespace before. Have to remember that one.