What Version of Xerces are you Using?

Monday, March 31st, 2008

XML developers often find themselves struggling with multiple versions of the Xerces parser for Java which support different, slightly incompatible versions of SAX, DOM, Schemas, and even XML itself. Xerces can be hiding in a number of different places including the classpath, the jre/lib/endorsed directory, and even the JDK itself. Here’s how you can find out which version you actually have.
(more…)

The Ten Commandments of Unicode

Friday, March 7th, 2008

1. I am Unicode, thy character set. Thou shalt have no other character sets before me.

2. Thou shalt carefully specify the character encoding and the character set whenever reading a text file.

3. Thou shalt not refer to any 8-bit character set as “ASCII”.

4. Thou shalt ensure that all string handling functions fully support characters from beyond the Basic Multilingual Plane. Thou shalt not refer to Unicode as a two-byte character set.

5. Thou shalt plan for additions of future characters to Unicode.

6. Thou shalt count and index Unicode characters, not UTF-16 code points.

7. Thou shalt use UTF-8 as the preferred encoding wherever possible.

8. Thou shalt generate all text in Normalization Form C whenever possible.

9. Thou shalt avoid deprecated characters.

10. Thou shalt not enter the private use area.