The fundamental reason to prefer draconian error handling is because it helps find bugs. I was recently reminded of this when Peter Murray-Rust thought he had found a bug in XOM. In brief, it was refusing to parse some files other tools let slip right through. In fact, XOM’s strict namespace handling had uncovered a cascading series of bugs that had been missed by various other parsers including Xerces-2j and libxml.
But before I describe what happened, let’s see if you can eyeball this bug. I’ll make it easier by cutting out the irrelevant parts so you know you’re looking right at the bug. Here’s the instance document we start with:
<!DOCTYPE svg SYSTEM
"http://www.w3.org/TR/2000/03/WD-SVG-20000303/DTD/svg-20000303-stylable.dtd">
<svg/>
And the referenced DTD is:
<!ENTITY % StylableSVG "INCLUDE" >
<!ENTITY % ExchangeSVG "IGNORE" >
<!ENTITY % SVGNamespace "http://www.w3.org/2000/svg-20000303-stylable" >
<!ENTITY % Shared PUBLIC "-//W3C//DTD SVG 20000303 Shared//EN" "svg-20000303-shared.dtd" >
%Shared;
Then in svg-20000303-shared.dtd we find this:
<!ATTLIST svg
xmlns CDATA #FIXED "%SVGNamespace;"
%stdAttrs; >
Not obvious, is it? In fact, I looked at this one for quite a while, and consulted several spec documents before Tatu Saloranta figured out what was actually wrong here. If it helps the relevant part of the XML specification is Section 4.4, XML Processor Treatment of Entities and References.
Give up? OK. Here’s what’s happening:
Read the rest of this entry »
This entry was posted
on Friday, June 5th, 2009 at 9:21 am and is filed under XML.
You can follow any responses to this entry through the Atom feed.
Both comments and pings are currently closed.