In Praise of Draconian Error Handling, Part 2

Friday, June 5th, 2009

The fundamental reason to prefer draconian error handling is because it helps find bugs. I was recently reminded of this when Peter Murray-Rust thought he had found a bug in XOM. In brief, it was refusing to parse some files other tools let slip right through. In fact, XOM’s strict namespace handling had uncovered a cascading series of bugs that had been missed by various other parsers including Xerces-2j and libxml.

But before I describe what happened, let’s see if you can eyeball this bug. I’ll make it easier by cutting out the irrelevant parts so you know you’re looking right at the bug. Here’s the instance document we start with:

<!DOCTYPE svg SYSTEM
"http://www.w3.org/TR/2000/03/WD-SVG-20000303/DTD/svg-20000303-stylable.dtd">
<svg/>

And the referenced DTD is:

<!ENTITY % StylableSVG "INCLUDE" >
<!ENTITY % ExchangeSVG "IGNORE" >
<!ENTITY % SVGNamespace "http://www.w3.org/2000/svg-20000303-stylable" >
<!ENTITY % Shared PUBLIC "-//W3C//DTD SVG 20000303 Shared//EN" "svg-20000303-shared.dtd" >
%Shared;

Then in svg-20000303-shared.dtd we find this:

<!ATTLIST svg
  xmlns CDATA #FIXED "%SVGNamespace;"
  %stdAttrs; >

Not obvious, is it? In fact, I looked at this one for quite a while, and consulted several spec documents before Tatu Saloranta figured out what was actually wrong here. If it helps the relevant part of the XML specification is Section 4.4, XML Processor Treatment of Entities and References.

Give up? OK. Here’s what’s happening:
(more…)

What Version of Xerces are you Using?

Monday, March 31st, 2008

XML developers often find themselves struggling with multiple versions of the Xerces parser for Java which support different, slightly incompatible versions of SAX, DOM, Schemas, and even XML itself. Xerces can be hiding in a number of different places including the classpath, the jre/lib/endorsed directory, and even the JDK itself. Here’s how you can find out which version you actually have.
(more…)

The State of Native XML Databases

Monday, August 13th, 2007

I’ve recently been asked by several people to summarize the state of native XML databases for those interested in exploring this space. IMHO, native XML databases are now roughly where relational databases were circa 1994: solid, proven technology that gets the job done but only if you pay big bucks to do it. However, there’s some promising open source activity on the horizon. To be brief, there are roughly four (maybe five) choices to consider:

  • Mark Logic
  • eXist
  • DB2 9
  • Berkeley DB XML

(more…)

North and South

Friday, July 6th, 2007

David Chapelle writes that

To anybody who’s paying attention and who’s not a hopeless partisan, the war between REST and WS-* is over. The war ended in a truce rather than crushing victory for one side–it’s Korea, not World War II. The now-obvious truth is that both technologies have value, and both will be used going forward.

That’s a nice analogy. Take it one step further though. WS-* is North Korea and REST is South Korea. While REST will go on to become an economic powerhouse with steadily increasing standards of living for all its citizens, WS-* is doomed to sixty+ years of starvation, poverty, tyranny, and defections until it eventually collapses from its own fundamental inadequacies and is absorbed into the more sensible policies of its neighbor to the South.
(more…)

Plain Text Config Files are Confusing

Monday, February 26th, 2007

There’s a large rebellion over XML config files from programmers who don’t like to type XML and don’t want to learn APIs for processing it. They’d rather limp along with the same scanf code they’ve been using for the last 20 years.

The problem is there really isn’t such a thing as a plain text config file. What there is are specially formatted text files that are easily as complex as the XML equivalent but inconsistent, poorly documented, and easily broken. For instance, consider this extract from LogValidator’s “plain text” config file:
(more…)