<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Comparing Strings For Equality</title>
	<atom:link href="http://cafe.elharo.com/blogroll/turkish/feed/" rel="self" type="application/rss+xml" />
	<link>http://cafe.elharo.com/blogroll/turkish/</link>
	<description>Longer than a blog; shorter than a book</description>
	<lastBuildDate>Wed, 08 Feb 2012 21:45:25 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: cem</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-857926</link>
		<dc:creator>cem</dc:creator>
		<pubDate>Thu, 03 Nov 2011 03:24:04 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-857926</guid>
		<description>A true story...
A small dot can change the meaning of the certain words a lot in Turkish... From  &quot;get bored&quot; to &quot;get f**ked&quot;.

Such an incident occurred in the past which have resulted with two deaths (one homicide -a woman stab to death by his husband and one suicide -which is the husband killed himself in prison later- ).

Cause: Just miss-printed SMS text by the hardware first(turns letter &quot;dotless-i&quot; to a regular english &quot;i&quot;) than a woman who miss-interpreted message as a serious insult against her and her family&#039;s honor in the heat of ongoing argument with his husband... which leads a dispute resulting to woman&#039;s family attacks the husband and stabbing him in the chest with knife... Than wounded husband manages to seize the knife and stab his wife, render her to severely wounded and eventually die at the hospital later.

The SMS message which the husband sent with his proper turkish char-set supported phone was... 
&quot;s?k???nca konuyu de?i?tiriyorsun&quot; 
with proper &quot;dottlessi-i&quot; in the word &quot;s?k???nca&quot;; meaning &quot;you change the topic when you cornered&quot;

The SMS message which the wife received with her sub-par phone which miss interpret or print dotless-i&#039;s as a regular english &quot;i&quot;s giving the words whole lot different meaning...
&quot;s*kisinca konuyu degistiriyorsun&quot; meaning &quot;you change the topic when you f**ked [with someone else]&quot;

The original turkish news link is (http://www.hurriyet.com.tr/gundem/8748359.asp?top=1).
PS: I&#039;ve try to translate the page with google-translate (to english) first... but ended up  rolling-on-the-floor-laughing. I wonder when the google-translate stop translating Turkish to gibberish (or vice versa)</description>
		<content:encoded><![CDATA[<p>A true story&#8230;<br />
A small dot can change the meaning of the certain words a lot in Turkish&#8230; From  &#8220;get bored&#8221; to &#8220;get f**ked&#8221;.</p>
<p>Such an incident occurred in the past which have resulted with two deaths (one homicide -a woman stab to death by his husband and one suicide -which is the husband killed himself in prison later- ).</p>
<p>Cause: Just miss-printed SMS text by the hardware first(turns letter &#8220;dotless-i&#8221; to a regular english &#8220;i&#8221;) than a woman who miss-interpreted message as a serious insult against her and her family&#8217;s honor in the heat of ongoing argument with his husband&#8230; which leads a dispute resulting to woman&#8217;s family attacks the husband and stabbing him in the chest with knife&#8230; Than wounded husband manages to seize the knife and stab his wife, render her to severely wounded and eventually die at the hospital later.</p>
<p>The SMS message which the husband sent with his proper turkish char-set supported phone was&#8230;<br />
&#8220;s?k???nca konuyu de?i?tiriyorsun&#8221;<br />
with proper &#8220;dottlessi-i&#8221; in the word &#8220;s?k???nca&#8221;; meaning &#8220;you change the topic when you cornered&#8221;</p>
<p>The SMS message which the wife received with her sub-par phone which miss interpret or print dotless-i&#8217;s as a regular english &#8220;i&#8221;s giving the words whole lot different meaning&#8230;<br />
&#8220;s*kisinca konuyu degistiriyorsun&#8221; meaning &#8220;you change the topic when you f**ked [with someone else]&#8221;</p>
<p>The original turkish news link is (<a href="http://www.hurriyet.com.tr/gundem/8748359.asp?top=1" rel="nofollow">http://www.hurriyet.com.tr/gundem/8748359.asp?top=1</a>).<br />
PS: I&#8217;ve try to translate the page with google-translate (to english) first&#8230; but ended up  rolling-on-the-floor-laughing. I wonder when the google-translate stop translating Turkish to gibberish (or vice versa)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Doug Held</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-601345</link>
		<dc:creator>Doug Held</dc:creator>
		<pubDate>Fri, 18 Feb 2011 22:33:55 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-601345</guid>
		<description>I&#039;ve recently received a lecture over dinner about how the Turkish i problem will never be solved.  I suggested the same solution as above: &quot;Different letters in Unicode&quot; by edavies; but was reminded that the non Turkish, latin i is also in use in Turkey. For example, in product names and borrowed European names.

When the European i is borrowed, Turkish users case it according to the Latin rules: i-&gt;I.

My only suggestion is for Turks to remove ? and i from the keyboards, and just use the pipe character :-(</description>
		<content:encoded><![CDATA[<p>I&#8217;ve recently received a lecture over dinner about how the Turkish i problem will never be solved.  I suggested the same solution as above: &#8220;Different letters in Unicode&#8221; by edavies; but was reminded that the non Turkish, latin i is also in use in Turkey. For example, in product names and borrowed European names.</p>
<p>When the European i is borrowed, Turkish users case it according to the Latin rules: i-&gt;I.</p>
<p>My only suggestion is for Turks to remove ? and i from the keyboards, and just use the pipe character <img src='http://cafe.elharo.com/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: helle</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-42996</link>
		<dc:creator>helle</dc:creator>
		<pubDate>Thu, 28 Dec 2006 12:45:03 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-42996</guid>
		<description>Marvelous. Thanks, will spread this among my friends!</description>
		<content:encoded><![CDATA[<p>Marvelous. Thanks, will spread this among my friends!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: danforth</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-22035</link>
		<dc:creator>danforth</dc:creator>
		<pubDate>Thu, 12 Oct 2006 20:04:36 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-22035</guid>
		<description>Hey!

Why don&#039;t people use the Collator class?!?

http://java.sun.com/j2se/1.4.2/docs/api/java/text/Collator.html

&quot;The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.&quot;</description>
		<content:encoded><![CDATA[<p>Hey!</p>
<p>Why don&#8217;t people use the Collator class?!?</p>
<p><a href="http://java.sun.com/j2se/1.4.2/docs/api/java/text/Collator.html" rel="nofollow">http://java.sun.com/j2se/1.4.2/docs/api/java/text/Collator.html</a></p>
<p>&#8220;The Collator class performs locale-sensitive String comparison. You use this class to build searching and sorting routines for natural language text.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: prozac for lovers. Jeanette Elamsson</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-371</link>
		<dc:creator>prozac for lovers. Jeanette Elamsson</dc:creator>
		<pubDate>Sat, 29 Apr 2006 20:58:50 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-371</guid>
		<description>&lt;strong&gt;prozac for lovers&lt;/strong&gt;

Prozac for lovers. The...</description>
		<content:encoded><![CDATA[<p><strong>prozac for lovers</strong></p>
<p>Prozac for lovers. The&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Cowan</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-108</link>
		<dc:creator>John Cowan</dc:creator>
		<pubDate>Sat, 07 Jan 2006 04:39:07 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-108</guid>
		<description>edavies, that was thought about when Unicode was set up and repeatedly since.  The trouble is that there is just too much legacy data (mostly in 8859-9 encoding) that doesn&#039;t distinguish between Turkish i and non-Turkish i.  There is simply no hope of getting people to make such a distinction systematically and correctly, so the problem won&#039;t go away.

Also, hexagonal French is moving back toward preserving accents in uppercase letters.  They were basically dropped just to accommodate typewriters, which didn&#039;t have enough keys to provide them.</description>
		<content:encoded><![CDATA[<p>edavies, that was thought about when Unicode was set up and repeatedly since.  The trouble is that there is just too much legacy data (mostly in 8859-9 encoding) that doesn&#8217;t distinguish between Turkish i and non-Turkish i.  There is simply no hope of getting people to make such a distinction systematically and correctly, so the problem won&#8217;t go away.</p>
<p>Also, hexagonal French is moving back toward preserving accents in uppercase letters.  They were basically dropped just to accommodate typewriters, which didn&#8217;t have enough keys to provide them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: edavies</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-27</link>
		<dc:creator>edavies</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:34:09 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-27</guid>
		<description>&lt;h3&gt;Different letters in Unicode&lt;/h3&gt;

This problem also illustrates the uneasy issue of when two symbols are regarded as the &quot;same&quot; in Unicode. For example, the Cyrillic capital letter es (U+0421) looks like (is a homoglyph of) the Latin capital letter c (U+0043) but the two are, quite reasonably, regarded as different. Given the distinction between the dotted and undotted letter i in Turkish maybe even the dotted lower case form and the undotted upper case form should have been regarded as distinct from their Latin homoglyphs. Of course, it&#039;s difficult to know where to draw the line - an English letter a is presumably the &quot;same&quot; as a French one, isn&#039;t it? However, different case conversion rules seem to me to be enough to trigger a separation. Except then, of course, we need to worry whether accented lower case letters in Canadian French are different from those in European French. Hmm, not easy. Now off to read RFCs 3490, 3491 and 3492 on IDNs to see what all that fuss is about. </description>
		<content:encoded><![CDATA[<h3>Different letters in Unicode</h3>
<p>This problem also illustrates the uneasy issue of when two symbols are regarded as the &#8220;same&#8221; in Unicode. For example, the Cyrillic capital letter es (U+0421) looks like (is a homoglyph of) the Latin capital letter c (U+0043) but the two are, quite reasonably, regarded as different. Given the distinction between the dotted and undotted letter i in Turkish maybe even the dotted lower case form and the undotted upper case form should have been regarded as distinct from their Latin homoglyphs. Of course, it&#8217;s difficult to know where to draw the line &#8211; an English letter a is presumably the &#8220;same&#8221; as a French one, isn&#8217;t it? However, different case conversion rules seem to me to be enough to trigger a separation. Except then, of course, we need to worry whether accented lower case letters in Canadian French are different from those in European French. Hmm, not easy. Now off to read RFCs 3490, 3491 and 3492 on IDNs to see what all that fuss is about.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-26</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:33:06 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-26</guid>
		<description>equalsIgnoreCase(String) also suffers the same problem. I&#039;ve always used &lt;code&gt;string1.equalsIgnoreCase(string2)&lt;/code&gt; thinking that it would take care of the messiness of case comparison. A quick look at the javadoc and source code suggests that it suffers the same problem. All it does is compare both the uppercase and lowercase version of each character. I wonder why they wouldn&#039;t have added an &lt;code&gt;equalsIgnoreCase(String, Locale)&lt;/code&gt; method? Also with regards to the link to Peter Norvig&#039;s toLowerCase() implementation: I don&#039;t think it solves the Turkish problem, but merely ignores it. In fact the javadoc of the given source says: &lt;blockquote&gt; Warning: Don&#039;t use this method when your default locale is Turkey. &lt;/blockquote&gt;</description>
		<content:encoded><![CDATA[<p>equalsIgnoreCase(String) also suffers the same problem. I&#8217;ve always used <code>string1.equalsIgnoreCase(string2)</code> thinking that it would take care of the messiness of case comparison. A quick look at the javadoc and source code suggests that it suffers the same problem. All it does is compare both the uppercase and lowercase version of each character. I wonder why they wouldn&#8217;t have added an <code>equalsIgnoreCase(String, Locale)</code> method? Also with regards to the link to Peter Norvig&#8217;s toLowerCase() implementation: I don&#8217;t think it solves the Turkish problem, but merely ignores it. In fact the javadoc of the given source says:<br />
<blockquote> Warning: Don&#8217;t use this method when your default locale is Turkey. </p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: jdf</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-25</link>
		<dc:creator>jdf</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:31:43 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-25</guid>
		<description>Don&#039;t create new strings just to compare them! You&#039;d be better served by &lt;a href=&quot;http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#regionMatches%28boolean,%20int,%20java.lang.String,%20int,%20int%29&quot; rel=&quot;nofollow&quot;&gt;String.regionMatches()&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Don&#8217;t create new strings just to compare them! You&#8217;d be better served by <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#regionMatches%28boolean,%20int,%20java.lang.String,%20int,%20int%29" rel="nofollow">String.regionMatches()</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Elliotte Rusty Harold</title>
		<link>http://cafe.elharo.com/blogroll/turkish/comment-page-1/#comment-24</link>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
		<pubDate>Fri, 06 Jan 2006 18:30:21 +0000</pubDate>
		<guid isPermaLink="false">http://minicafe.elharo.com/wordpress/java/comparing-strings-for-equality/#comment-24</guid>
		<description>&lt;h3&gt;Re: Case-insignificant comparison&lt;/h3&gt;

The JavaDoc for &lt;code&gt;compareToIgnoreCase()&lt;/code&gt; says:

&lt;blockquote&gt;    This method returns an integer whose sign is that of calling compareTo with normalized versions of the strings where case differences have been eliminated by calling Character.toLowerCase(Character.toUpperCase(character)) on each character.&lt;/blockquote&gt;

Assuming the algorithm is implemented as specified, (and a quick peek at the source code shows it is, at least in the version of Sun&#039;s JDK I have handy) it would have the same issue.</description>
		<content:encoded><![CDATA[<h3>Re: Case-insignificant comparison</h3>
<p>The JavaDoc for <code>compareToIgnoreCase()</code> says:</p>
<blockquote><p>    This method returns an integer whose sign is that of calling compareTo with normalized versions of the strings where case differences have been eliminated by calling Character.toLowerCase(Character.toUpperCase(character)) on each character.</p></blockquote>
<p>Assuming the algorithm is implemented as specified, (and a quick peek at the source code shows it is, at least in the version of Sun&#8217;s JDK I have handy) it would have the same issue.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

