XML 2.0

First for the record, I’m speaking only for myself, not my employer, the W3C, Apple, Google, Microsoft, WWWAC, the DNRC, the NFL, etc.

XML 1.1 failed. Why? It broke compatibility with XML 1.0 while not offering anyone any features they needed or wanted. It was not synchronous with tools, parsers, or other specs like XML Schemas. This may not have been crippling had anyone actually wanted XML 1.1, but no one did. There was simply no reason for anyone to upgrade.

By contrast XML did succeed in replacing SGML because:

  1. It was compatible. It was a subset of SGML, not a superset or an incompatible intersection (aside from a couple of very minor technical points no one cared about in practice)
  2. It offered new features people actually wanted.
  3. It was simpler than what it replaced, not more complex.
  4. It put more information into the documents themselves. Documents were more self-contained. You no longer needed to parse a DTD before parsing a document.

To do better we have to fix these flaws. That is, XML 2.0 should be like XML 1.0 was to SGML, not like XML 1.1 was to XML 1.0. That is, it should be:

  1. Compatible with XML 1.0 without upgrading tools.
  2. Add new features lots of folks want (but without breaking backwards compatibility).
  3. Simpler and more efficient.
  4. Put more information into the documents themselves. You no longer need to parse a schema to find the types of elements.

These goals feel contradictory, but I plan to show they’re not and map out a path forward.

Read the rest of this entry »

In Praise of Draconian Error Handling, Part 2

The fundamental reason to prefer draconian error handling is because it helps find bugs. I was recently reminded of this when Peter Murray-Rust thought he had found a bug in XOM. In brief, it was refusing to parse some files other tools let slip right through. In fact, XOM’s strict namespace handling had uncovered a cascading series of bugs that had been missed by various other parsers including Xerces-2j and libxml.

But before I describe what happened, let’s see if you can eyeball this bug. I’ll make it easier by cutting out the irrelevant parts so you know you’re looking right at the bug. Here’s the instance document we start with:

<!DOCTYPE svg SYSTEM 
"http://www.w3.org/TR/2000/03/WD-SVG-20000303/DTD/svg-20000303-stylable.dtd">
<svg/>

And the referenced DTD is:

<!ENTITY % StylableSVG "INCLUDE" >
<!ENTITY % ExchangeSVG "IGNORE" >
<!ENTITY % SVGNamespace "http://www.w3.org/2000/svg-20000303-stylable" >
<!ENTITY % Shared PUBLIC "-//W3C//DTD SVG 20000303 Shared//EN" "svg-20000303-shared.dtd" >
%Shared;

Then in svg-20000303-shared.dtd we find this:

<!ATTLIST svg
  xmlns CDATA #FIXED "%SVGNamespace;"
  %stdAttrs; >

Not obvious, is it? In fact, I looked at this one for quite a while, and consulted several spec documents before Tatu Saloranta figured out what was actually wrong here. If it helps the relevant part of the XML specification is Section 4.4, XML Processor Treatment of Entities and References.

Give up? OK. Here’s what’s happening:

Read the rest of this entry »