<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Comparing Strings For Equality</title>
	<atom:link href="http://cafe.elharo.com/blogroll/turkish/feed/" rel="self" type="application/rss+xml" />
	<link>http://cafe.elharo.com/blogroll/turkish/</link>
	<description>Longer than a blog; shorter than a book</description>
	<pubDate>Sun, 07 Sep 2008 18:04:47 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: helle</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-42996</link>
		<dc:creator>helle</dc:creator>
		<pubDate>Thu, 28 Dec 2006 12:45:03 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-42996</guid>
		<description>Marvelous. Thanks, will spread this among my friends!</description>
		<content:encoded><![CDATA[<p>Marvelous. Thanks, will spread this among my friends!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: danforth</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-22035</link>
		<dc:creator>danforth</dc:creator>
		<pubDate>Thu, 12 Oct 2006 20:04:36 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-22035</guid>
		<description>Hey!

Why don't people use the Collator class?!?

http://java.sun.com/j2se/1.4.2/docs/api/java/text/Collator.html

"The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text."</description>
		<content:encoded><![CDATA[<p>Hey!</p>
<p>Why don&#8217;t people use the Collator class?!?</p>
<p><a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/Collator.html" rel="nofollow">http://java.sun.com/j2se/1.4.2/docs/api/java/text/Collator.html</a></p>
<p>&#8220;The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prozac for lovers. Jeanette Elamsson</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-371</link>
		<dc:creator>prozac for lovers. Jeanette Elamsson</dc:creator>
		<pubDate>Sat, 29 Apr 2006 20:58:50 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-371</guid>
		<description>&lt;strong&gt;prozac for lovers&lt;/strong&gt;

Prozac for lovers. The...</description>
		<content:encoded><![CDATA[<p><strong>prozac for lovers</strong></p>
<p>Prozac for lovers. The&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Cowan</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-108</link>
		<dc:creator>John Cowan</dc:creator>
		<pubDate>Sat, 07 Jan 2006 04:39:07 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-108</guid>
		<description>edavies, that was thought about when Unicode was set up and repeatedly since.  The trouble is that there is just too much legacy data (mostly in 8859-9 encoding) that doesn't distinguish between Turkish i and non-Turkish i.  There is simply no hope of getting people to make such a distinction systematically and correctly, so the problem won't go away.

Also, hexagonal French is moving back toward preserving accents in uppercase letters.  They were basically dropped just to accommodate typewriters, which didn't have enough keys to provide them.</description>
		<content:encoded><![CDATA[<p>edavies, that was thought about when Unicode was set up and repeatedly since.  The trouble is that there is just too much legacy data (mostly in 8859-9 encoding) that doesn&#8217;t distinguish between Turkish i and non-Turkish i.  There is simply no hope of getting people to make such a distinction systematically and correctly, so the problem won&#8217;t go away.</p>
<p>Also, hexagonal French is moving back toward preserving accents in uppercase letters.  They were basically dropped just to accommodate typewriters, which didn&#8217;t have enough keys to provide them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: edavies</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-27</link>
		<dc:creator>edavies</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:34:09 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-27</guid>
		<description>&lt;h3&gt;Different letters in Unicode&lt;/h3&gt;

This problem also illustrates the uneasy issue of when two symbols are regarded as the "same" in Unicode. For example, the Cyrillic capital letter es (U+0421) looks like (is a homoglyph of) the Latin capital letter c (U+0043) but the two are, quite reasonably, regarded as different. Given the distinction between the dotted and undotted letter i in Turkish maybe even the dotted lower case form and the undotted upper case form should have been regarded as distinct from their Latin homoglyphs. Of course, it's difficult to know where to draw the line - an English letter a is presumably the "same" as a French one, isn't it? However, different case conversion rules seem to me to be enough to trigger a separation. Except then, of course, we need to worry whether accented lower case letters in Canadian French are different from those in European French. Hmm, not easy. Now off to read RFCs 3490, 3491 and 3492 on IDNs to see what all that fuss is about. </description>
		<content:encoded><![CDATA[<h3>Different letters in Unicode</h3>
<p>This problem also illustrates the uneasy issue of when two symbols are regarded as the &#8220;same&#8221; in Unicode. For example, the Cyrillic capital letter es (U+0421) looks like (is a homoglyph of) the Latin capital letter c (U+0043) but the two are, quite reasonably, regarded as different. Given the distinction between the dotted and undotted letter i in Turkish maybe even the dotted lower case form and the undotted upper case form should have been regarded as distinct from their Latin homoglyphs. Of course, it&#8217;s difficult to know where to draw the line - an English letter a is presumably the &#8220;same&#8221; as a French one, isn&#8217;t it? However, different case conversion rules seem to me to be enough to trigger a separation. Except then, of course, we need to worry whether accented lower case letters in Canadian French are different from those in European French. Hmm, not easy. Now off to read RFCs 3490, 3491 and 3492 on IDNs to see what all that fuss is about.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-26</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:33:06 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-26</guid>
		<description>equalsIgnoreCase(String) also suffers the same problem. I've always used &lt;code&gt;string1.equalsIgnoreCase(string2)&lt;/code&gt; thinking that it would take care of the messiness of case comparison. A quick look at the javadoc and source code suggests that it suffers the same problem. All it does is compare both the uppercase and lowercase version of each character. I wonder why they wouldn't have added an &lt;code&gt;equalsIgnoreCase(String, Locale)&lt;/code&gt; method? Also with regards to the link to Peter Norvig's toLowerCase() implementation: I don't think it solves the Turkish problem, but merely ignores it. In fact the javadoc of the given source says: &lt;blockquote&gt; Warning: Don't use this method when your default locale is Turkey. &lt;/blockquote&gt;</description>
		<content:encoded><![CDATA[<p>equalsIgnoreCase(String) also suffers the same problem. I&#8217;ve always used <code>string1.equalsIgnoreCase(string2)</code> thinking that it would take care of the messiness of case comparison. A quick look at the javadoc and source code suggests that it suffers the same problem. All it does is compare both the uppercase and lowercase version of each character. I wonder why they wouldn&#8217;t have added an <code>equalsIgnoreCase(String, Locale)</code> method? Also with regards to the link to Peter Norvig&#8217;s toLowerCase() implementation: I don&#8217;t think it solves the Turkish problem, but merely ignores it. In fact the javadoc of the given source says:<br />
<blockquote> Warning: Don&#8217;t use this method when your default locale is Turkey. </p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: jdf</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-25</link>
		<dc:creator>jdf</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:31:43 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-25</guid>
		<description>Don't create new strings just to compare them! You'd be better served by &lt;a href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#regionMatches%28boolean,%20int,%20java.lang.String,%20int,%20int%29" rel="nofollow"&gt;String.regionMatches()&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Don&#8217;t create new strings just to compare them! You&#8217;d be better served by <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#regionMatches%28boolean,%20int,%20java.lang.String,%20int,%20int%29" rel="nofollow">String.regionMatches()</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elliotte Rusty Harold</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-24</link>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:30:21 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-24</guid>
		<description>&lt;h3&gt;Re: Case-insignificant comparison&lt;/h3&gt;

The JavaDoc for &lt;code&gt;compareToIgnoreCase()&lt;/code&gt; says:

&lt;blockquote&gt;    This method returns an integer whose sign is that of calling compareTo with normalized versions of the strings where case differences have been eliminated by calling Character.toLowerCase(Character.toUpperCase(character)) on each character.&lt;/blockquote&gt;

Assuming the algorithm is implemented as specified, (and a quick peek at the source code shows it is, at least in the version of Sun's JDK I have handy) it would have the same issue.</description>
		<content:encoded><![CDATA[<h3>Re: Case-insignificant comparison</h3>
<p>The JavaDoc for <code>compareToIgnoreCase()</code> says:</p>
<blockquote><p>    This method returns an integer whose sign is that of calling compareTo with normalized versions of the strings where case differences have been eliminated by calling Character.toLowerCase(Character.toUpperCase(character)) on each character.</p></blockquote>
<p>Assuming the algorithm is implemented as specified, (and a quick peek at the source code shows it is, at least in the version of Sun&#8217;s JDK I have handy) it would have the same issue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-23</link>
		<dc:creator>Neil</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:29:38 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-23</guid>
		<description>&lt;h3&gt;Case-insignificant comparison&lt;/h3&gt;

Does the &lt;code&gt;compareToIgnoreCase()&lt;/code&gt; method have the same issues, or does that always work?</description>
		<content:encoded><![CDATA[<h3>Case-insignificant comparison</h3>
<p>Does the <code>compareToIgnoreCase()</code> method have the same issues, or does that always work?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: edavies</title>
		<link>http://cafe.elharo.com/blogroll/turkish/#comment-17</link>
		<dc:creator>edavies</dc:creator>
		<pubDate>Fri, 06 Jan 2006 17:58:35 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-17</guid>
		<description>&lt;h3 class="subject"&gt;Trivia&lt;/h3&gt;

Both branches of the &lt;code&gt;if&lt;/code&gt; in your code fragment print "The domains are the same.".</description>
		<content:encoded><![CDATA[<h3 class="subject">Trivia</h3>
<p>Both branches of the <code>if</code> in your code fragment print &#8220;The domains are the same.&#8221;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
