A Brief Introduction to XInclude

It’s often convenient to divide long XML documents into multiple files. The classic example is a book, customarily divided in chapters. Each chapter may be further subdivided into sections. Traditionally this has implemented via external entity references. For example,

<?xml version="1.0"?>
<!DOCTYPE book SYSTEM "book.dtd"[
  <!ENTITY chapter1 SYSTEM "malapropisms.xml">
  <!ENTITY chapter2 SYSTEM "mispronunciations.xml">
  <!ENTITY chapter3 SYSTEM "madeupwords.xml">
]>
<book>
  <title>The Wit and Wisdom of George W. Bush</title>
  &chapter1;
  &chapter2;
  &chapter3;
</book>

However, external entity references have a number of limitations. Among them:

The individual component files cannot be treated in isolation. They often aren’t themselves full, well-formed XML documents. They cannot have document type declarations.

  • The document must have a DTD, and the parser must read the DTD. Not all parsers do.

  • If any of the pieces are missing, then the entire document is malformed. There’s no option for error recovery.

  • Only entire files can be included. You can’t include just one paragraph from a document.

  • There’s no way to include unparsed text such as an example Java program or XML document in a technical book. Only well-formed XML can be included, and all such XML is parsed. (SGML actually had this ability, but it was one of the features XML removed in the process of simplification.)

XInclude is an emerging specification from the W3C that endeavors to create a mechanism for building large XML documents out of their component parts which does not have these limitations. XInclude can combine multiple documents and parts thereof independently of validation. Each piece can be a complete XML document, a part of an XML document, or a non-XML text document like a Java program or an e-mail message.

Syntax

XInclude defines a single include element in the http://www.w3.org/2001/XInclude namespace. This can be mapped to any prefix though xi is customary. (In the remainder of this article, I will simply assume the xi prefix has been bound to the correct namespace URI without further comment.) Each xi:include element has an href attribute that contains a URL pointing to the file to include. For example, using XIncludes instead of external entity references, the previous book example can be rewritten like this:

<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
  <title>The Wit and Wisdom of George W. Bush</title>
  <xi:include href="malapropisms.xml"/>
  <xi:include href="mispronunciations.xml"/>
  <xi:include href="madeupwords.xml"/>
</book>`
Of course you can also use absolute URLs where appropriate:
<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
  <title>The Wit and Wisdom of George W. Bush</title>
  <xi:include href="http://www.whitehouse.gov/malapropisms.xml"/>
  <xi:include href="http://www.whitehouse.gov/mispronunciations.xml"/>
  <xi:include href="http://www.whitehouse.gov/madeupwords.xml"/>
</book>

XInclude processing is recursive. That is, an included document can itself include another document. For example, a book might be divided into front matter, back matter, and several parts:

<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="frontmatter.xml"/>
  <xi:include href="part1.xml"/>
  <xi:include href="part2.xml"/>
  <xi:include href="part3.xml"/>
  <xi:include href="backmatter.xml"/>
</book>

Each part might be further divided into a part intro and several chapters:

<?xml version="1.0"?>
<part xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="intro1.xml"/>
  <xi:include href="chapter_1.xml"/>
  <xi:include href="chapter_2.xml"/>
  <xi:include href="chapter_3.xml"/>
  <xi:include href="chapter_4.xml"/>
</part>

There’s no limit to how deep this can go. Only circular inclusion (Document A includes Document B which includes, directly or indirectly, Document A) is forbidden. When an XInclude processor reads an XML document it resolves all references and returns a document that contains no XInclude elements. XInclusion is not part of XML 1.0 or the XML Information Set (Infoset). Thus to actually understand such a document, you’ll normally need to pass it through an XInclude processor that replaces the xi:include elements with the documents they point to. This may be done automatically by a server side process or it might be done on the client side by an XInclude aware browser. It may be hooked into a custom SAX program using a SAX filter that resolves the XIncludes. It may even be an option for parser resolution. However it does not happen automatically. If you want it, install the necessary software ands then explicitly tell the software to resolve the XInclude elements. The Gnome Project’s libxml and my own XOM both support XInclude. For example, if you’re using the xmllint tool bundled with libxml, you specify the --xinclude flag resolve the include elements like this:

$ xmllint --xinclude book.xml
<?xml version="1.0"?>
<book xmlns:xi="http://www.w3.org/2001/XInclude">
<title>The Wit and Wisdom of George W. Bush</title>
<preface>
…

Of course, there are APIs you can call from your own code, as well as programs to run from the command line. For instance, this code fragment resolves all the include elements in a XOM Document object and returns a new document that contains all the included content:

Document resolveDocument = XIncluder.resolveInPlace(inputDocument);

Unparsed Text

Technical articles like this one often include example code: Java and C programs, XML and HTML documents, e-mail messages and text files, and so forth. Within these examples characters like < and & should be understood as raw text rather than parsed as markup. You can indicate that you want a particular included document to be treated as text by adding a parse="text" attribute to the xi:include element. For example, this fragment loads the source code for the Java program SpellChecker.java from the examples directory into a code element:

<code>
<xi:include parse="text" href="examples/SpellChecker.java" />
</code>

Processes that are downstream from the XInclusion will see the complete text of the file SpellChecker.java like they would any other text. For instance, such data would be passed to a SAX ContentHandler object’s characters() method. This is pretty much exactly the same way a parser would treat the content if it were typed in a CDATA section.

The XInclude processor will attempt to determine the character encoding of the text file from any available metadata, such as a charset parameter in the included document’s MIME type. If the document is an XML document, then the processor will next try to use the byte order mark, the encoding declaration and the other customary heuristics for determining the character encoding of an XML document. If neither of these is suitable, the character set can be specified explicitly by an encoding attribute using the same names used for the encoding declaration in an XML document. For example, this element includes a file that’s written in Latin-1:

<xi:include parse="text" encoding="ISO-8859-1"
            href="examples/SpellChecker.java" />

If none of these options are available, then the processor assumes the document is written in UTF-8.

Fallback

Servers crash. Network connections fail. The DNS system gets congested. For all these reasons and more, documents included from remote servers may be temporarily unavailable. The default action for an XInclude processor in such a case is simply to give up and report a fatal error. However, the xi:include element may contain an xi:fallback element that contains alternate content to be used if the requested resource cannot be found. For example, this xi:include element tries to load the file at http://www.whitehouse.gov/malapropisms.xml. However, if somebody deletes that file, then it provides some literal content instead:

<xi:include 
  href="http://www.whitehouse.gov/malapropisms.xml">
  <xi:fallback>

    <para>
    Our enemies are innovative and resourceful, and so are we. 
    They never stop thinking about new ways to harm our country 
    and our people, and neither do we.
    </para>

  </xi:fallback>
</xi:include>

The xi:fallback element can even include another xi:include element. For example, this xi:include element begins by attempting to include the document at http://www.whitehouse.gov/malapropisms.xml. However, if somebody deletes that file, then it will try http://politics.slate.msn.com/default.aspx?id=76886 instead.

<xi:include 
  href="http://www.whitehouse.gov/malapropisms.xml">
  <xi:fallback>
    <xi:include href =
   "http://politics.slate.msn.com/default.aspx?id=76886 l" />
  </xi:fallback>
</xi:include>

The xi:fallback element is not used if the document can be located but is malformed. That is always a fatal error.

Include elements can contain other content besides the single xi:fallback element. For example, this xi:include element contains a xi:fallback and a para element:

<xi:include 
  href="http://www.whitehouse.gov/malapropisms.xml">

  <para>
    Well, I think if you say you're going to do something 
    and don't do it, that's trustworthiness.
  </para>

  <xi:fallback>
  <xi:include href="http://politics.slate.msn.com/default.aspx?id=76886l"/>
  </xi:fallback>
</xi:include>

However, the processor will ignore all such content. When the xi:include element is replaced, the para element will silently vanish.

XPointer

The URLs used in XInclude href attributes can have XPointer fragment identifiers. If so they only include those parts of the external document selected by the XPointer. For example, this XPointer includes only the malapropism elements from the document bushisms.xml:

<xi:include href="bushisms.xml#xpointer(//malapropism)"  /> 

Since XPointers can point up, down, and sideways in an XML document, and do not necessarily select a contiguous region of a document, they present significant problems for streaming applications and APIs like SAX, XNI, and StAX. Full XInclude with XPointer support really requires a tree-based API such as DOM or XOM, and can be expected to use at least as much memory as the sum of all the documents combined together.

Validation and other processes

One of the most common questions about XInclude is how inclusion interacts with validation, XSL transformation, and other processes that may be applied to an XML document. The short answer is that it doesn’t. XInclusion is not part of any other XML process. It is a separate step which you may or may not perform when and where it is useful to you.

For example, consider validation against a schema. A document can be validated before or after inclusion, or both, or neither. If you validate the document before the xi:include elements are replaced, then the schema has to declare the xi:include elements just like it would declare any other element. If you validate the document after the xi:include elements are replaced, then the schema has to declare the replacement elements. You can even write a single schema that covers both cases, by using a choice to permit an element to contain either an xi:include element or its replacement elements.

For another example, consider XSL transformation. XSLT was defined several years before XInclude. The XSLT algorithm operates on well-formed XML documents. An XSLT processor acts on xi:include elements exactly like it acts on any other element; that is, it finds a template rule that matches elements with the local name include in the http://www.w3.org/2001/XInclude namespace and instantiates that rule’s template. It does not automatically replace the xi:include elements. Of course, if you want the xi:include elements to be replaced before the stylesheet is applied, you can first use an XInclude processor to resolve the includes and generate a new XML document, then pass the new document to the XSLT processor along with the stylesheet. You can even resolve the includes, pass the merged document to the XSLT processor for transformation, and then resolve includes again on the output of the transformation in case the stylesheet inserted any new xi:include elements. Inclusion and transformation are separate and orthogonal processes that can be performed in whichever order is convenient in the local environment. There is no canonical processing model for XML.

You cannot simply place include elements in a document and expect them to resolved automatically. There’s always an extra step where you tell some piece of software somewhere to resolve the XIncludes. Depending on the environment this may be a command line flag, an option in a config file, or a separate program you run manually. However assuming you can do that, XInclude is a very useful technique for authoring large documents in multiple, smaller, more manageable pieces.

To Learn More

The canonical definition of XInclude is of course the XInclude specification itself. The current version is a proposed recommendation, but I don’t expect the final version to be hugely different. XInclude is covered in a little more depth in Chapter 12 of XML in a Nutshell (3rd Edition) and Chapter 19 of the XML 1.1 Bible.

7 Responses to “A Brief Introduction to XInclude”

  1. cafes@fbeausoleil.ftml.net Says:

    What about processing instructions on included documents ?

    For example, if one includes this document:

    <?xml version="1.0"?>
    <?xml-stylesheet href="toxhtml.xsl"?>
    <document>
      <para>content</para>
    
    </document>
    

    Into this one:

    <?xml version="1.0"?>
    <book>
      <xi:include xmlns:xi="…" href="…"/>
    </book>

    What elements will be included ? document and para, or
    html/body/p ?

  2. David Le Strat Says:

    XML and XSL Reuse: Leveraging XML XInclude with Xerces and Xalan

    Great overview. I have posted a related post on my blog focusing on using XInclude in XML used in conjunction with xsl:include for XSL transformation to achieve XML reusability.

  3. Hugh Mcbride Says:

    XInclude with J2EE 1.3.1

    I am trying to use the new XInclude feature of JAXP 1.3 (unbundled) but I am restricted to J2EE 1.3.1 . I have already placed the dom.jar, sax.jar xalan.jar and xercesImple.jar in the jre/lib/ext/ directory (using the endorsed directory is for JDK 1.4 . But the sample XInlcude demo included will not compile. What info I could find said not to add the jaxp-api.jar (Have tried it with and with out still doesnt work The main method of the of the class is below with the line causing the problem marked Any help/pointers would be greatly appreciated. Am using Eclipse 3.0 as an editor and am sure there is no problem there

    public static void main(String argv[]) {
    if (argv.length < 2) { printUsage(); System.exit(1); } try {
    --> DocumentBuilderFactory dbf =
    DocumentBuilderFactory.newInstance(); // make parser xinclude aware
    by setting the XIncludeAware to true. dbf.setXIncludeAware(true);
    dbf.setNamespaceAware(true); // parse the xml file. DocumentBuilder
    parser = dbf.newDocumentBuilder(); parser.setErrorHandler(new
    ErrorHandlerImpl()); Document doc = parser.parse(argv[0]); // write
    //the output to specified file. DOMImplementation impl
    =doc.getImplementation(); DOMImplementationLS implLS
    =(DOMImplementationLS) impl.getFeature("LS", "3.0");
    DOMErrorHandlerImpl eh = new DOMErrorHandlerImpl(); Output out =
    new Output();
    LSSerializer writer = implLS.createLSSerializer();
    writer.getDomConfig().setParameter("error-handler",
    newDOMErrorHandlerImpl()); out.setSystemId(argv[1]);
    writer.write(doc,out); System.out.println("//////// finished
    /////////");
    } catch (Exception ex) { System.out.println("Error occurred"
    +ex); }
  4. Elliotte Rusty Harold Says:

    If you include an entire document (i.e. don’t use an xpointer attribute) then any comments and processing instructions in the prolog and epilog are included as well. They are not stripped. (The DOCTYPE declaration, if any, is stripped.) These extra processing instructions and cmments are normally not a problem.

  5. andyjbs Says:

    Client side support?

    I have some web pages at the moment that are dynamic rather than plain (X)HTML solely to make use of includes. I’d like to replace that with a client side solution as it seems silly to use a dynamic server side technology for one feature. I guess to be useful on the web today, a decent set of browsers that would need XInclude support would be: IE Mozilla

    KHTML Do any of these currently offer XInclude support? Well, if not, I’m going to seriously look at using good old fashioned entity replacement with XHTML as a solution.

  6. Elliotte Rusty Harold Says:

    Sadly no. As far as I know, there are no mainstream browsers with client side XInclude support.

  7. Porn Serach Says:

    and sex, porn, Fuck, teen porn celebrity porn you can masturbate