Gemini Doesn’t Understand Namespaces Either

Last week Gemini gave me a very wrong answer about exception handling in Java. This week let’s see if it can handle an even less controversial but often misunderstood point about XML namespaces. (Spoiler: it can’t.)

The question is:

Should XML namespace URLs include a version number?

The correct answer is no.1, 2 Gemini’s answer is the opposite, and the reasons it gives show why it’s wrong.

Here’s What Gemini says:

It’s generally considered good practice to include a version number in XML namespace URLs. While not strictly required, it offers several significant benefits:

  • Versioning and Compatibility: As your XML schema evolves, you might introduce changes that break backward compatibility.

    Including a version number in the namespace URL allows you to create new namespaces for updated schemas (e.g., http://example.com/schema/v2) while keeping the old namespaces for existing documents that rely on the older schema (e.g., http://example.com/schema/v1). This prevents older documents from becoming invalid when a newer schema is deployed.

  • Clear Identification: The version number provides a clear and unambiguous way to identify which version of the schema a particular XML document is using. This is crucial for processing and validating the document correctly. Without a version, it’s difficult to know which schema rules to apply.

  • Namespace Management: Versioned namespaces make it easier to manage different versions of your schema. You can have multiple versions of the schema available simultaneously without conflicts. Tools and libraries can use the namespace URL to determine the correct schema version to use.

  • Documentation and Communication: The version number in the namespace URL serves as a form of documentation. It immediately tells anyone working with the XML document which schema version is expected. This helps avoid confusion and ensures everyone is on the same page.

  • Future-Proofing: Including a version number is a form of future-proofing your XML structure. It allows you to make changes to your schema without breaking existing documents.

Example:

Instead of:

<root xmlns="http://example.com/schema">
  ...
<root>

Use:

<root xmlns="http://example.com/schema/v1">
  ...
<root>

If you later update your schema, you would use a different namespace:

<root xmlns="http://example.com/schema/v2">
  ...
</root<>

When might you not include a version?

  • Extremely Stable Schemas: If you have a schema that is guaranteed never to change (which is rare), you might omit the version. However, this is generally discouraged, as requirements change over time.
  • Internal or Temporary Schemas: For schemas used only within a very limited context (e.g., within a single application) and where you have complete control over all usage, you might choose to omit the version. But even in these cases, it's often a good idea to include a version for clarity.

In summary: While not technically mandatory, including a version number in your XML namespace URLs is strongly recommended for almost all cases. It's a best practice that leads to more robust, maintainable, and future-proof XML structures. It's a small investment that can save you a lot of trouble down the line.

This answer contains errors of fact. In particular, it confuses namespaces with schemas, and believes there is a 1:1 relationship between them. There isn't. Schemas and namespaces are two different things, and in general enjoy a many-to-many relationship. One namespace can have more than one schema and, less commonly, one schema can describe multiple namespaces. In practice namespaces are much more important for XML processing than schemas are. Schemas are primarily used for design and documentation, and they have their place, but they aren't part of most XML processing toolchains. Indeed, in the very common case where there is no formally defined schema, it can be the namespace that helps select the correct implicit schema to use.

From here, let's go point by point. "As your XML schema evolves, you might introduce changes that break backward compatibility...Including a version number in the namespace URL allows you to create new namespaces for updated schemas." The reality is that an evolving schema usually doesn't change the meaning of existing elements and attributes. Doing so would break all existing tooling, and developers hate that.

Changing a namespace URL is like changing the package name of a Java library. Do it when a new version is a completely new and incompatible library, but don't do it if you're just adding some new features and fixing bugs. Instead add a version element inside the XML as the Maven pom.xml does. In general, you don't want a new namespace for the updated schema because the new schema can validate the old documents, but you do want a version element to indicate the schema in use.

Next, the "version number provides a clear and unambiguous way to identify which version of the schema a particular XML document is using." That actually is reasonable in isolation. But that doesn't mean the version number should be in the namespace URL. A simple version element serves the same purpose without breaking all existing tooling.

Now, "Namespace Management: Versioned namespaces make it easier to manage different versions of your schema. You can have multiple versions of the schema available simultaneously without conflicts. Tools and libraries can use the namespace URL to determine the correct schema version to use." This simply isn't true. Namespaces are not designed or intended to pick a schema, much less a specific version of a schema. That's the purpose of the xsi:schemaLocation attribute. Even more commonly the vocabulary in use is determined by inspection of the document or external signals. For instance, if you're writing a toolchain that processes RSS feeds, you don't need a namespace to tell you that you should be expecting documents that conform to the RSS specification. (Experts will note that I'm gliding over a lot of details here, but a more precise discussion still wouldn't match namespaces to schemas.)

Next, "The version number in the namespace URL serves as a form of documentation. It immediately tells anyone working with the XML document which schema version is expected. This helps avoid confusion and ensures everyone is on the same page." Yes, but a version element does that too, and better.

Finally, "Future-Proofing: Including a version number is a form of future-proofing your XML structure. It allows you to make changes to your schema without breaking existing documents." Again, however, Gemini is simply wrong here. Including a version number does not allow you to make changes to your schema without breaking existing documents. To one way of thinking, no change to a schema will ever break an existing XML document. The schema isn't needed to process the document. XML documents are self-contained and self-describing. This is a core value proposition of XML compared to its predecessor SGML.

Nonetheless, even if you consider a new schema that defines existing documents as invalid as somehow breaking them, this still isn't a good reason to put versions in namespace URLs. In fact, it's the namespace change that is mosat likely to break things.

When you put version URLs in the namespace (or otherwise change the namespace with a new version of the vocabulary) you immediately break almost all tools that are written to expect the old namespace. It's like changing the name of a city because a new street was built. Sure we want to know the date of the map, but Chicago is still Chicago.

Also interesting: Gemini references a paper about schema versioning to justify its answer.
"Impact of XML Schema Evolution", Pierre Genevès, Nabil Layaïda, Vincent Quint. ACM Transactions on Internet Technology, 2011. The paper actually exists, but says nothing about namespaces. It's all about schemas. As far as I can discern it's a good paper that makes none of the mistakes Gemini makes.

I should compare this to other models like ChatGPT, but I do see a general pattern emerging. When 99% of the community believes the wrong answer, generative models believe it too. They are not intelligent enough (or at all) to figure out why the community is wrong, and what the right answer really is. GenAI is the ultimate cargo cult programmer. It copies what it's seen others do without any understanding of why they did what they did, or whether it was correct when they did it.

That doesn't mean it isn't useful. If the field is well understood and the territory well mapped out, then GenAI can copy good answers too. The protein folding work that won two Nobel Prizes in 2024 is an example of that. Biochemists had already figured out how to calculate protein structures and done it for about 1500 different proteins. Their machine learning model did the same thing for millions more proteins. It did the same work faster on many more proteins. It discovered new protein shapes, but it didn't invent new processes or techniques.

There are other cases where perhaps more sophisticated reasoning models at least seem to have derived new, interesting, and correct facts. This result on antibiotic resistant superbugs is one such case. José R. Penadés said that Google's co-scientist produced five reasonable hypotheses about the mechanism involved, at least one of which he and his team hadn't thought of. "He told the BBC of his shock when he found what it had done, given his research was not published so could not have been found by the AI system in the public domain."

Maybe. Or maybe the hypotheses were already out there on arXiv or PLOS One or somewhere else that copilot had trained on, and Penadés just hadn't noticed. Different scientists not being aware of relevant work in their field is hardly a new phenomenon. In the 19th century Lobachevsky and Bolyai independently came up with hyperbolic geometry, some years after Gauss had already done it. There are any number of other examples of work whose importance was realized only after someone else rediscovered it. And indeed if the only thing copilot is doing is surfacing content that's already out there in the sea of scientific literature, that's still a pretty damn useful thing to be doing. But I do want to make sure that's not what's it's doing before I say it's discovering things or producing new knowledge.

Leave a Reply