Co-occurrence constraints are a perennial topic at XML conferences because the usual schema languages (DTDs, W3C Schemas, RELAX NG) can’t handle them. Consequently they’re a fertile source of papers like XML 2006′s keynote from Paolo Marinelli on Co-constraint Validation in a Streaming Context.
However, I mentioned in hallway conversation that I wasn’t sure how common or necessary co-occurrence constraints really were. In fact, I didn’t think I’d ever found one in the real world. Naturally two days later I stumbled across several of them in a very common, very frequent real world example.
I was putting together a schema for order information for an online store. I’m sure you’ve seen dozens, probably hundreds, of these. One piece of this is the credit card information, for which a a typical element looks like this:
<CreditCard> <Name>Elliotte Rusty Harold</Name> <Number>5123 4567 8901 2345</Number> <Type>Mastercard</Type> <CVV2>314</CVV2> <Expiration>2007-01</Expiration> <Address1>6 Metrotech Center</Address1> <Address2>Dept. of Computer Science</Address2> <City>Brooklyn</City> <State>NY</State> <Zip>11201</Zip> </CreditCard>
Now imagine we want to validate that. There are actually several coocccurence constraints just in those ten fields:
- If the card type is American Express, then the first digit of the card is 3
- If the card type is Visa, then the first digit of the card is 4
- If the card type is Mastercard, then the first digit of the card is 5
- If the card type is American Express, the security code (CVV2) is four digits; otherwise it’s three digits.
Of course there are also quite a few other things besides co-occurrence constraints we can’t validate in the schema:
- The credit card is authorized for the purchase.
- The expiration date is in the future.
- The credit card checksum is correct.
- The zip code matches the city and state.
I could actually write RELAX NG extension functions to handle the first three, though that might not be the best architecture for the problem. The rule that the zip code must match the city and state is actually a co-occurrence constraint that requires access to external data, and RELAX NG custom type libraries can’t handle that.
Declarative schemas may be a useful tool, and are easier to write than imperative validation code. However, they’re rarely able to check everything you need to check. Schema validation can be the first step in deciding whether to accept a document. It usually shouldn’t be the last.