Comments on: The Ten Commandments of Unicode

By: GZIPInputStream reading line by line - Tutorial Guruji

GZIPInputStream reading line by line - Tutorial Guruji — Mon, 24 May 2021 09:34:16 +0000

[…] to explicitly specify the encoding is against the second commandment. Use the default encoding at your […]

By: java – GZIPInputStream reading line by line-ThrowExceptions – ThrowExceptions

Thu, 19 Mar 2020 12:40:27 +0000

[…] to explicitly specify the encoding is against the second commandment. Use the default encoding at your […]

By: What is XML BOM and how do I detect it? – inneka.com

What is XML BOM and how do I detect it? – inneka.com — Sat, 28 Sep 2019 16:04:32 +0000

[…] advocate encoding as Unicode wherever possible (see also the 10 Commandments of Unicode). That said, XML allows the representation of any Unicode character via escape entities (e.g. […]

By: What is XML BOM and how do I detect it? - QuestionFocus

What is XML BOM and how do I detect it? - QuestionFocus — Fri, 01 Dec 2017 14:51:12 +0000

[…] advocate encoding as Unicode wherever possible (see also the 10 Commandments of Unicode). That said, XML allows the representation of any Unicode character via escape entities (e.g. […]

By: GZIPInputStream reading line by line | ASK AND ANSWER

GZIPInputStream reading line by line | ASK AND ANSWER — Tue, 12 Jan 2016 21:15:21 +0000

[…] to explicitly specify the encoding is against the second commandment. Use the default encoding at your […]

By: What do I need to know about Unicode? | ASK AND ANSWER

What do I need to know about Unicode? | ASK AND ANSWER — Sat, 19 Dec 2015 09:16:26 +0000

[…] I also like Elliotte Rusty Harold’s Ten Commandments of Unicode. […]

By: Elliotte Rusty Harold

Elliotte Rusty Harold — Sat, 11 May 2013 12:39:44 +0000

Code points are an element of a particular encoding such as UTF-16, not an element of the Unicode character set.

By: McDowell

McDowell — Fri, 19 Apr 2013 15:36:16 +0000

Shouldn’t…

6. Thou shalt count and index Unicode characters, not UTF-16 code points.

…be…

6. Thou shalt count and index Unicode code points, not UTF-16 code units.

By: Michael Doran

Michael Doran — Tue, 12 May 2009 20:08:35 +0000

8. Thou shalt generate all text in Normalization Form C whenever possible.

I’ve tended towards Form D (Canonical Decomposition) as being the more desirable Unicode normalization form. I am curious as to the rationale for recommending Form C (Canonical Decomposition, followed by Canonical Composition).

By: Elliotte Rusty Harold

Elliotte Rusty Harold — Thu, 02 Apr 2009 12:14:18 +0000

Not really, I’m afraid. There are a lot of things that could go wrong with that process.