Must Ignore vs. Microformats

I tend to assume most people know what they’re talking about, especially if they’re talking about something I don’t really understand. Sometimes it takes a really blatant example of just what it is they’re saying before I realize they’re talking out of their posteriors.

For instance, I used to think homeopathy was a vaguely reasonable practice based on traditional herbal medicine. Then one day I was stuck at the pharmacist for fifteen minutes waiting for a prescription. Since I had nothing better to do, I picked up a pamphlet about the principles of homeopathy and started to read. Almost immediately it became clear that there was nothing in the little glass vials except plain water, that there was no possible way any of these “remedies” could do anything except through the placebo effect, and that the whole field was complete and utter bunk.

It’s important to note here that I didn’t read some detailed scientific study about homeopathy. I didn’t read an article in the Skeptical Inquirer debunking homeopathy. I read a really well-written piece by an advocate of homeopathy that explained exactly what homeopathy was and why they thought it worked; and that clear explanation showed me (or anyone with a layperson’s understanding of chemistry) that homeopathy was completely bogus. I have recently had the same experience with microformats.

I confess for a while I just didn’t get the whole microformat brouhaha. I hadn’t paid a lot of attention to it, and I didn’t feel like I needed to. It didn’t seem to solve any of my problems. I had this vague picture in my head that it involved embedding non-HTML markup in HTML documents; and that struck me as a reasonable thing, if not quite as powerful as the full-blown XML I was using. (Just like herbal medicine seemed like a reasonable option when modern pharmaceuticals weren’t available.) But then I read recent article about microformats by Jack D. Herrington; and suddenly I realized just what the microformatters were so excited about, and just as suddenly I realized it was bunk.

Let me explain. Here’s the first example from Herrington’s article. It shows embedding some additional metadata in an HTML document to identify events and dates:

<html>
  <body>
    <div class="vevent">
      <a class="url" href="http://myevent.com">
        <abbr class="dtstart" title="20060501">May 1</abbr> - 
        <abbr class="dtend" title="20060502">02, 2006</abbr>

        <span class="summary">My Conference opening</span> - at
        <span class="location">Hollywood, CA</span>
      </a>
      <div class="description">The opening days of the conference</div>

    </div>
  </body>
</html>

But why would anyone write markup like this? It brings exactly nothing to the table.* It is substantially more opaque and harder to work with than traditional XML like this:

<html>
  <body>
    <vevent>
      <a href="http://myevent.com">
        <date>
        <abbr title="20060501">May 1</abbr> -
        <abbr title="20060502">02, 2006</abbr>
        </date>
        <summary>My Conference opening</summary> - at
        <location>Hollywood, CA</location>
      </a>
      <description>
      <div>The opening days of the conference</div>
      </description>
    </vevent>
  </body>
</html>

Sure, you could use XSLT, SAX, DOM and other tools on the microformat version; but years of experience teaches me that you’ll have a much easier time with the basic element structure I’ve outlined above. Why wouldn’t you do that? For example, Herrington’s article has lots of XPath queries like .//*[contains(@class,'description'). These now become simpler (and possibly faster) queries like .//description. The more complex the document and the more complex the format, the greater the simplicity you’ll achieve by moving to a macroformat instead of a microformat.

This brings up another point. Microformats are strictly limited in the amount of structure they can impose. All you can comfortably put in an attribute value is a single name. Full-blown element structures are much more extensible. With macroformats you can add namespaces, attributes, and other XML structure to your non-HTML markup.

I’ll tell you something else: web browsers will handle the macroformatted example just fine. I’ve been using this technique on Cafe au Lait and Cafe con Leche for years and it causes exactly zero problems. All web browsers back to Mosaic 1.0 simply drop out tags they don’t recognize. Feel free to sprinkle as much XML tag spice into your documents as you like. You won’t cause any problems for browsers. They’ll just render the HTML as usual. More modern browsers (pretty much everything since IE 5) even allow you to key off the new tags in XSLT.

The only reason I can imagine you might choose a microformat over a macroformat is because macroformats are invalid XHTML, but so what? XML doesn’t have to be valid! That’s a deliberate design decision in XML. Some say invalidity is the real revolution in XML. It’s what XML brings to the table that SGML never had.

Microformats bring exactly nothing to the table. All they do is complexify the markup and make it far harder to address with XPath and other XML tools. They don’t make pages any easier to index. They don’t make pages any easier to style. They don’t make pages easier to search or look better in web browsers. They certainly don’t make the pages easier to validate if you should want to do that (not that you really need to). There’s simply no advantage to microformats compared to macroformats. Microformats do solve a real problem (embedding extra non-HTML markup in web pages) but they were invented by people who apparently didn’t realize that this problem had been solved years ago, and much more effectively. It’s like someone realized they needed a tool to eat soup and invented the fork, without paying any attention to the spoons sitting in their kitchen drawers. Microformats are the wrong answer. Using them will only make your work harder and more complicated.


* There is one use case where you might need to hide your markup inside attributes as is done in microformats. It involves overlap, which HTML handles but XML and XHTML don’t. However that’s a relatively uncommon problem, and doesn’t come into play here.

38 Responses to “Must Ignore vs. Microformats”

  1. Chadwick Says:

    Wow. I never considered it, but I guess you are right. Very recently I’ve solved some problems by using custom attributes in HTML tags (person_id = “12”), but I wasn’t sure if it would validate as XHTML. Then I realized that it’s stupid for spec to get in the way of solving problems, especially since I thought it was a pretty elegant solution.

    I can’t count the number of times I’ve wished for more semantic tags. If I could recognize them in the stylesheets (location { display: block; } … etc,) then I guess there’s not reason not to use them.

    If the standards bodies are not going to define tags that I find useful every day, I might as well just do it myself, and hope the pressure motivates some change.

    This could be the credo of your ‘macroformats.’

  2. Charles Krause Says:

    And how exactly do you plan to style Macroformats?

    It appears that you would have to create a standard container tag (div, span, whatever) around each one of your “custom tags”, and assign a class to it, just to be able to address the formatting of each element separately. You may not want CSS “tweaking” of each element separately, but I’m sure someone, somewhere out there, WILL. As you say, unrecognized tags are simply “dropped out”. While I haven’t experimented with it (and if this is incorrect, please say so), I think it unlikely that there is consistent POSITIVE handling of this across browsers – i.e. they apply the STYLE of the tag but ignore the markup.

    The use of class names to specify the role of a tag is not an ideal situation, but it DOES allow browsers to style all elements.

    I think everyone would be a lot happier if you could drop raw XML and a stylesheet into browsers – but that situation doesn’t exist today, nor – with “backwards compatibility” – is it going to be reasonable to attempt any time soon. Instead, element names being included as class names, while somewhat of a “kludge” approach, allows the parsing of existing elements as if they WERE XML (bringing ~80% of the benefits of XML to the table), admittedly requiring some “tweaking” of parsers; allows CSS styling of all elements; and doesn’t break existing XHTML coding standards.

    The current Microformat initiative by Microformats.org is not about a “top down”, data model driven, XML approach (which, admittedly would be the most flexible, efficient, and “best” approach), it is about a USABLE, bottom-up, retrofitting the semantics into the existing markup language, without losing any of the current abilities (such as styling capabilities), and without breaking everyone’s pages.

    Give 90% of the population browsers that can read and style XML consistently (or at least as consistently as XHTML), and you are correct. Until then, Microformat “kludges” do more than you Macroformat idea.

  3. Taisuke Yamada Says:

    If you’re following formal XML way to embed microcontent in XHTML, why not just use XML namespace to avoid invalidity (and allow easier mash-up)? It’d be something like

    <a href=”http://myevent.com” rel=”nofollow”>

    … – at …
    </a>

    This also works fine with most (if not all) recent browsers.
    And by the way, you can even do this if you just want to embed custom attribute inside (X)HTML:

    <a href=”…” rel=”nofollow”>…</a>

    Primary reason for using microformats is, I guess, that you can safely embed it in Plain Old HTML, not because you want to avoid having invalid XHTML – no?

    I’m somewhat confused with this microformats thing – what do I actually gain from doing “hacky” microformats way instead of doing it with XHTML+xmlns?

  4. Taisuke Yamada Says:

    Ugh, I included actual xmlns-based example, but it seems all XML examples are broken…

  5. Elliotte Rusty Harold Says:

    Check out this page. If your browser can read it, then it can handle XSLT styled pages without any hiccups. As of 2006 that includes essentially every modern browser except the text based browsers like lynx (and lynx really can’t handle any styling at all, even on regular HTML).

    Even IE5 handles real XSLT these days. It didn’t use to, but one of the numerous security updates in the last few years replaced the old XSLT engine with a more modern, standards conformant one.

  6. Charles Krause Says:

    Hmm – I must conceed the point to you then 🙂

    If modern browsers can indeed style XML documents as web pages with the same flexibility of formatting as XHTML – and the server actually is feeding nothing more than a raw XML file and a XSL style sheet to the browser, then not only do microformats make more sense in XML, but I have to ask why anyone is still coding in XHTML and not XML, other for reasons of “interia”?

  7. Todd Ditchendorf’s Blog » Blog Archive » Macroformats Says:

    […] Elliotte Rusty Harold reads microformats for trash and I love every delicious second of it. […]

  8. Ralph M. Prescott Says:

    To understand the homeopathy pamphlet you first need to:

    • Shred it
    • Mix it with 5000 gallons of new pulp
    • Make 500 reams of new paper
    • Take a sheet of paper from the new batch
    • Shred that

    Repeat 1000 times…

    THEN read the last sheet. It makes much more sense then. ;D

  9. Standard Deviations » Microformats, Macroformats, and the Value of Invalid XHTML Says:

    […] Elliotte Rusty Harold makes some excellent points regarding microformats, offers a macroformats alternative, and even finds time to advocate invalid XML. Some choice quotes: The only reason I can imagine you might choose a microformat over a macroformat is because macroformats are invalid XHTML, but so what? XML doesn’t have to be valid! That’s a deliberate design decision in XML. Some say invalidity is the real revolution in XML. It’s what XML brings to the table that SGML never had. […]

  10. Pete Says:

    Thank god someone has said this! I can appreciate that html lacks any methods of grouping data pertinent to an arbitrary object, but using tag attributes to create these structures is about the most bafflingly limited and tediously ‘Web 2.0’ way i can imagine. Especially, as Elliotte points out, when XML has been there and done that before any of these nu-web fools were “refactoring” the twinkle in their milkman’s eye! Its about time people accepted that html is limited and dealt with it.

  11. Scott Reynen Says:

    The only thing microformats do that you can’t do with plain old XML is work for people and applications that are comfortable with HTML and not XML. That just happens to be an incredibly large population. But pretty much everything covered by microformats has a well-defined plain old XML equivalent, so translation between the two is trivial.

    There’s no reason the two approaches need to compete; they both serve the same end: a more data-rich web. Go ahead and markup your data in XML within HTML. If it becomes popular, someone will build the necessary translators to and from microformats, which are already becoming popular. But don’t just talk about it. The proof is in real-world use.

  12. Avdi Says:

    Some part of the back of my mind had been wondering about this ever since I first heard of microformats. It clashed with what I understood about the (supposed) strengths XML. But I just assumed that they were put together by smart people who were solving a real problem. What do I know, I figured; I’m not a professional web developer.

    I guess this is just an instance of the fact that NOBODY READS STANDARDS. Back when I first got interested in web coding for personal projects I printed out a copy of RFC2616 and read it cover-to-cover, along with some of the smaller RFCs it references. Now, years later, people are starting to discover that lo and behold, HTTP is more than just GET and POST, and are calling it “REST”. I could have told them that.

  13. Rob Says:

    I would much prefer it if XML (without XSLT) could be used within XHTML documents and selected via the CSS but I was under the impression that valid markup is important for other reasons like SEO.

    I like the simpler microformats from a CSS stand-point, and it cuts out any need for XSLT which in this case is kinda pointless if you’re only converting to XHTML for presentation (or are the semantic meanings somehow translated aswell?).

    HTML is visual while XML is meaningful right? The hybrid (XHTML) is trying to do both now, so isn’t that the right direction to go or should some separation remain? I’m in two minds about the whole thing.

  14. Graceful Exits » Crouching Harold; hidden formats Says:

    […] Elliotte Rusty Harold roundly disses microformats, comparing the practice of utilising them to homeopathy, of all “disciplines.” A bit of cheeky banter, so it’s probably churlish to point out that the comparison itself turns out to be unsound within his own argument: whereas homeopathy might arguably be no solution to any problem, Elliotte’s beef with microformats seems to be that they solve a problem—expressing non-XHTML structure within XHTML—for which he believes there are more efficient solutions. […]

  15. Nate Says:

    An interesting idea.. but markup validity is actually helpful in debugging and what not, and at least in theory, valid pages stand a better chance of holding mustard in future browsers.

  16. Ryan Cannon Says:

    You missed the point. Microformats aren’t what *you* can do with XML-blended XHTML, Microformats are about what *we* can do with (X)HTML today without having to worry about server configurations and MIME types. Just because you use doesn’t mean I shouldn’t use . Microformats are based on current, stable standards and conventions across the web. Check out some of the Technorati tools for more structured searches based on Microformats before you try a go-it-your-own solution.

  17. Chris Radcliff Says:

    I’m far from drinking the Microformats kool-aid, but I’ve seen enough of it (and used enough of the resulting data) that I have to point out some big flaws in the “macroformat” idea. Charles, Ryan, Nate, and Scott point out some aspects of it, as does the original article: “Greasemonkey doesn’t work on non-HTML pages, and that’s where microformats shine.” He goes on to say, “getting formats other than HTML accepted in the real world is very difficult. So layering formats on top of HTML makes a lot of sense.”

    And that’s the whole thing, right there. HTML is a bizarre but well-understood beast, and it’s still the Web’s favorite language. Microformats are a way to represent structured data in actual HTML, not XML+HTML or with HTML-like markup. That way any HTML tool (or renderer, or filter…) can deal with the markup just as it always has. It ends up being more convoluted (and vevent is probably the worst case of that), but that ends up being an easy-enough hurdle to overcome, both on the generation and parsing sides. (I’m speaking from experience here.) The rel=’tag’ microformat was simple enough to create plugins for, and even crazy formats like vevent lend themselves to generation by tools.

    Oh, and saying that “All web browsers… simply drop out tags they don’t recognize” isn’t quite right, because HTML gets parsed (munged, generated, repurposed) by a lot more tools than just browsers. Expressing semantic markup in HTML ensures that the markup won’t get filtered out when the HTML travels on, whether it’s copied into a WordPress blog post or indexed by Google. That alone convinced me to give microformats a try.

    On a side note, the use of CSS class notation provides something of value that straight XML elements can’t match: in vevent, an event’s URL can be either class=”url” or class=”uid” or both, class=”url uid”. That means that the same piece of data can be marked up more than once, just by adding 4 bytes of markup.

  18. Bill de hOra Says:

    To produce your XML, I need to do at least this

    – design the XML format
    – write the XSLT stylesheet
    – edit the XML

    I view sourced the sample xml page, took out the XSLT PI and saved it to disk. IE wouldn’t show the file, it was not wF. Mozilla displayed it. Oo read it in as literal XML, tags and all. Word didn’t like the encoding or something.

    I changed the file extension to html. It looks passable in IE and and Moz. Word read it in as an editable document (no tags). Oo read it in, it looks fugly with the XML prolog showing.

    Conclusion: HTML has greater reach than XML, even when the HTML is XML faking it.

    For those kind of reasons, I’m not buying that uF are bunk any more than I’d buy the notion that Wysiwyg wordprocessors are bunk. The people they target don’t design formats, don’t write XSLT, and don’t like (almost universally in my experience, with one exception) XML editors. They don’t know what to do when word complains about a file; they’ll tend to ring their programmer son in law, IT support, or just give up and do something else.

    You’re looking at this entirely from a technical/developer standpoint; as someone who is technical and a deveoper, I have a lot of sympathy for that view. But uF are point blank not optimised for developers. If anything, on the web, history teaches us that dumb obviously brain dead technology tends to win out. We end doing the heavy lifting against crappy formats, but overall, lots of value is added because on the whole more people can do something useful with the data.

    The fork and spoon metaphor doesn’t work for me either. Try comparing home made soup versus something that came out of tin and that’s much closer to what’s going on here.

    The adoption test will be around how many publishing tools ship uF templates. I can look at something like FCK, TinyMCe, WP and MT, and have *no* problem imagining custom styles and templates based on defined uF fomats rolling out in the next 18 months. Or maybe it won’t happen. Who knows. But I’ll speculate that if it kicks off, you’ll see much less variation in uF domain types than we’ve seen in attempts to get domain specific XML vocabs adopted at layer 8 and above. There’ll be infinite variability in structure and well-formedness, but that’s another issue, solvable by parse at any cost uF tools and enraged programmers.

  19. James Wheare Says:

    XML doesn’t have to be valid!

    Wow, strong emphasis. It’s just a pity that it’s really not a viable solution to serve XHTML as XML. If you are serving XHTML as text/html it’s not XML. Also, doesn’t XML require the >xml< prolog? Woops, never mind that this triggers the lovely quirks mode rendering in IE.

    Microformats are designed to suit the current state of web development. Sure, there are better ways to do it, but these come at a price. We don’t live in an ideal world, and the trade-offs are significant.

    Until we’re ready to serve XHTML as XML, I’m afraid your macroformats are a pipe dream.

  20. Elliotte Rusty Harold Says:

    If a supermarket labels a package of pork by-products as rib-eye steak, does that make it beef? Vice versa, if it labels a rib-eye steak as pig snouts, does that make it pork?

    The MIME type does not determine whether or not a given data stream is XML. If the stream is well-formed according to the XML 1.0 specification, then it is XML. It may be other things too: a file can be both HTML and XML. However the only determination of whether a particular sequence of bytes is or is not an XML document is by comparing it to the BNF grammar and well-formedness rules of XML 1.0.

    Labels such as MIME types and file extensions suggest to applications (including web browsers) how they might choose to interpret a given sequence of bytes ; but they do not determine the nature of those byte sequences.

    Bottom line: if you are serving well-formed XHTML as text/whoop-dee-doo, it is XML.

    P.S. The XML declaration is not required. If it causes you practical problems, just leave it out.

  21. Elliotte Rusty Harold Says:

    Bill,

    First of all I don’t know what happened when you saved tradeshows.xml, but something you did broke it. The file as it exists on my site is well-formed.

    ~$ xmllint --noout http://www.cafeaulait.org/tradeshows.xml
    ~$ 

    Possibly you have an IE bug, not an XML bug.

    Secondly, the reason that page is written and served in XML is because it made my life easier. I used to edit an HTML version of that information instead. It was vastly too hard to keep up to date. The XML version is much simpler to maintain, and consequently the page is updated more frequently and with less effort. yes, I had to write an XSLT stylesheet; but I only had to do that once. This may not be the right approach for someone whose web editing is confined to MySpace and WordPress, but then neither are microformats. Microformats and XSLT are both advanced developers’ tools.

  22. James Wheare Says:

    OK, let me rephrase. It may well be well-formed XML, but as far as the browsers are concerned, it’s interpreted as tag soup. But it’s tag soup that they know how to render, and that you can manipulate with CSS and the HTML DOM alone.

    Sure, microformats may require some slightly more complicated (though still trivial) parsing behind the scenes for anyone who wants to play with them, but it’s far more lightweight for the content creator than designing a macroformat and the associated XSLT, not to mention browser interpretation issues. Adding fake tags into the mix is potentially more harmful to the user experience than using classnames. Anything outside of the XHTML spec is fair game as far as browser rendering is concerned.

    From an XML purist’s perspective it may be an abomination, but for the humble front end web developer who has a grasp of HTML, CSS and the DOM, it’s golden, and more importantly, harmless.

  23. Eric Meyer Says:

    “All you can comfortably put in an attribute value is a single name.”

    Actually, no: ‘class’ (as well as a few other attributes in [X]HTML) will quite comfortably accept many names. This is the main reason why ‘class’ gets used a lot in microformats, though there are other reasons too. One could even argue that putting all your names into a single attribute is more efficient (in several senses) than a bunch of nested elements.

  24. Bill de hOra Says:

    “First of all I don’t know what happened when you saved tradeshows.xml, but something you did broke it.”

    No doubt. I’ll still say tho’ that uF are clearly suboptimal for us technical types, but I suspect non-technical types will adopt heavily. I think then we’ll have to follow suit, or at least arrange deal with uF.

    “The MIME type does not determine whether or not a given data stream is XML. ”

    HTTP header metadata is authorative. If you get an XML file with a pdf media type it’s PDF at the app protocol layer. Anything else, or any kind of special casing doesn’t scale altho’ that’s a different argument to tunneling XML as html for expediency or even neccessity.

  25. jacob harvey Says:

    While I won’t argue that microformats are perfect I’ve found a missconception with them. When I first started looking at them I was astounded that so many “modern” semantic relationships were ignored. Who the heck wants 10 spans like all of the examples and demos use? But everything works fine as long as you use the classes correctly. So add in that h3, or build a group of reviews into an ordered list. It provides more added value if you combine a few extra attributes with a block of logical, semantic code.

  26. Pete Prodoehl Says:

    Your page at http://www.cafeaulait.org/tradeshows.xml isn’t valid. It’s got 399 errors. At least with Microformats we can write valid XHTML, which, if you’ve ever tried to debug HTML/CSS issues, is pretty darn important.

  27. Elliotte Rusty Harold Says:

    The page at Your page at http://www.cafeaulait.org/tradeshows.xml isn’t HTML. It’s raw XML. Validity is completely unnecessary in this case. I suspect you just passed the page to an HTML validator. That makes about as much sense as testing a German Shepherd for compliance with the National Highway Traffic Safety Administration standards for automobiles.

    Debugging XML/CSS issues is much easier than debugging HTML/CSS issues because the browser doesn’t have any preconceptions about how it should or should not display any given element. It’s all in the style sheet.

  28. Semantic Web Links 08-08-06 at pixelsebi’s repository Says:

    […] Must Ignore vs. Microformats – Ein Artikel für Macroformate und gegen Mikroformate. Vor allem die Comments sollte man nach dem Artikel noch lesen. Einen Kommentar dazu wird es von mir definitiv bei Zeiten auch noch geben. Vielleicht auch eher im Podcast. […]

  29. Holger Will Says:

    Hi Elliotte,

    Thank you very much for this informative article! I just want to add some points, which may be the source for some misconceptions of some of the commenters.

    First of all having XML data-islands in XHTML is completely valid, its just that DTD can’t handle them properly; that is, if you try to validate a combound document with a DTD, it will give you false negatives ( the validator shows an error, even if there is none ). Just consider DTDs as deprecated, since the W3C now uses RelaxNG instead of DTDs.

    Second and more importantly, in a moderns XML world, you don’t have simple plain HTML documents, but instead you have document fragments in different namespaces mixed into another ( http://www.w3.org/2004/CDF/ ). A namespace aware browser treats a fragment in the XHTML namespace as XHTML, so it doesn’t matter much what MIME type you use ( as long as its an XML MIME type), that is the type of document is completely irrelevant.

    And as a third argument, people seem to forget that there are different languages you can use, but microfomats only work in (x)HTML. a “macroformat” can be used in
    any other XML based language, without any change.

    I’ve written a very simple example of a bar chart ( as a prove of concept ), which can be used in XHTML, SVG, XSL-FO, standalone or any other XML vocabulary, without changing the data or the stylesheet.( http://www.treebuilder.de/default.asp?file=306286.xml )

    Examples of well known and widely supported “macroformats” that come to mind are
    the CC license stuff and the Dublin Core vocabulary.

    cheers
    Holger

  30. Craig Allen Says:

    With regard to homeopathy, last I heard, the history of science tended to show that empirical results trumps our “understanding” of what makes sense. Homeopathy works, well enough and benignly enough, that many quite rational people prefer homeopathic remedies to the “patent medicine du jour” for some situations.

    As to the XML discussion, I prefer YAML myself.

    rationality is overrated
    Craig

  31. whodat Says:

    >>> Pete Prodoehl Says: Your page at http://www.cafeaulait.org/tradeshows.xml isn’t valid.
    >>> It’s got 399 errors. At least with Microformats we can write valid XHTML, which,
    >>> if you’ve ever tried to debug HTML/CSS issues, is pretty darn important.

    debug html/css? use the EYES html/css debugger.

  32. I Think It’s Interesting » Blog Archive » Microformats are Web 2.0 virus Says:

    […] is also a more interesting post Must Ignore vs. Microformats by Elliotte Rusty Harold. The one point I do not agree is that Elliotte argues that XML does not […]

  33. digital.flowstate.org Says:

    Serendipity and XHTML

    I learned about XHTML a long time ago, but I’ll admit that it hasn’t been that long since I learned that XHTML should be served with a MIME type of application/xhtml+xml. I went ahead and converted a separate blog I run to use XHTML with the proper…

  34. digital.flowstate.org Says:

    Go DTD-less

    Anybody who views the XHTML source of this page will see that there’s an XML declaration (XHTML is XML, after all) but no doctype declaration. A doctype declaration may have usefulness in HTML, but it is needless in XHTML. I am so glad Rusty Elliott…

  35. Jordan Says:

    Hi Elliotte,

    If a supermarket labels a package of pork by-products as rib-eye steak, does that make it beef? Vice versa, if it labels a rib-eye steak as pig snouts, does that make it pork?

    The MIME type does not determine whether or not a given data stream is XML. If the stream is well-formed according to the XML 1.0 specification, then it is XML

    Labels such as MIME types and file extensions suggest to applications (including web browsers) how they might choose to interpret a given sequence of bytes; but they do not determine the nature of those byte sequences.

    Your analogy breaks down because the MIME type isn’t just a hint as to the contents of a file — it is the only reliable indicator of how a document should be treated.

    If you serve an XML document as text, it is not an XML document so far as your browser is concerned, nor does the server intend your browser to treat it as such. For example, let’s say I serve you the following comment as a plain text document:

    <sarcasm>IE 4 is a *really* great browser for the modern day…</sarcasm>

    All fine and dandy, if your browser follows the simple imperative I delivered by indicating the MIME. However, if your browser declares, “Aha! This looks like XML, so that’s how I will parse it,” what exactly is it going to deliver to the end user in place of my actual comment? Are they going to see the “sarcasm” tag I used purely for effect, and which was an important part of my message?

    As for XHTML content, serving it as pure XML should remove any default formatting from every element, something which will generally have unpleasant implications for most stylesheets, which take such formatting into account. Serving a document designed to be served as text/html as application/xhtml+xml has a slew of implications for how any scripts or stylesheets associated with a document should be written.

    To reiterate, the MIME type isn’t just a hint, it’s the only reliable way to know how to parse and display any document.

  36. Jason Says:

    Nice article. I like reading the reasons not to use something as well as to use. I think you have Homeopathy and Herbal Medicine mixed up. They aren’t the same as far as I can tell.

  37. Chris Charabaruk Says:

    There’s one case where using the defined microformats will always win out against your macroformats, and thats in CMSes like Drupal that have the ability to filter out unknown or undesired tags. At least with microformats based on class and rel attributes on currently existing tags, it’s possible to still provide the benefits of the microformats used, without having to worry if the tag filter will eat it.

  38. Jason Says:

    So if these macroformats work, then I should be able to browse to a page that has them and glean some information from the page that allows me to do more useful things with the data. Such as adding a show from that list to my calendar.
    Browsing… Viewing… Waiting for some tool to be developed that will expose your <show> info to my calendar application… Doesn’t work

    So now I’ll declare macroformats bunk until there is a standard created and mass adoption which spawns tools for consuming the rich information and exposing the data to my applications. Until then, it might as well be a pile of <span> tags.