<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="http://feedproxy.google.com/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feedproxy.google.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" xml:lang="en" xml:base="http://cafe.elharo.com/wp-atom.php">
	<title type="text">The Cafes</title>
	<subtitle type="text">Longer than a blog; shorter than a book</subtitle>

	<updated>2009-01-01T22:22:22Z</updated>
	<generator uri="http://wordpress.org/" version="2.6.3">WordPress</generator>

	<link rel="alternate" type="text/html" href="http://cafe.elharo.com" />
	<id>http://cafe.elharo.com/feed/atom/</id>
	

			<link rel="self" href="http://feedproxy.google.com/TheCafes" type="application/atom+xml" /><entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">Prefer Multiline if</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/321UTUyhW2I/" />
		<id>http://cafe.elharo.com/?p=367</id>
		<updated>2009-01-01T22:22:22Z</updated>
		<published>2009-01-01T19:07:00Z</published>
		<category scheme="http://cafe.elharo.com" term="Programming" /><category scheme="http://cafe.elharo.com" term="C" /><category scheme="http://cafe.elharo.com" term="Java" /><category scheme="http://cafe.elharo.com" term="Objective C" /><category scheme="http://cafe.elharo.com" term="style" />		<summary type="html"><![CDATA[C-family languages including Java, C#, and C++ do not require braces around single line blocks. For example, this is a legal loop:
for (int i=0; i &#60; args.length; i++) process(args[i]);
So&#8217;s this:
for (int i=0; i &#60; args.length; i++)
    process(args[i]);
However both of these are very bad form, and lead to buggy code. All blocks in [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/programming/prefer-multiline-if/">&lt;p&gt;C-family languages including Java, C#, and C++ do not require braces around single line blocks. For example, this is a legal loop:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;for (int i=0; i &amp;lt; args.length; i++) process(args[i]);&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;So&amp;#8217;s this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for (int i=0; i &amp;lt; args.length; i++)
    process(args[i]);&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However both of these are very bad form, and lead to buggy code. All blocks in C-like languages should be explicitly delimited by braces across multiple lines in all cases. Here&amp;#8217;s why:&lt;br /&gt;
&lt;span id="more-367"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The most dangerous form is a multiline block that doesn&amp;#8217;t use braces. The problem is that when you start with this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;if (p.needsGiftWrapping())
    wrap(p);&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;sooner or later some programmer&amp;#8211;perhaps you, perhaps someone else&amp;#8211;is going to discover a need to add a second line. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;if (p.needsGiftWrapping())
    wrap(p);
   ribbon(p);&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Bang! That code is now buggy. It looks like the if block applies to both statements, but in fact it only applies to one. The indentation is lying about the intent of the code. By contrast, this form is less dangerous:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;if (p.needsGiftWrapping()) wrap(p);&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;In this example, it&amp;#8217;s obvious to any programmer who comes along and adds a line, that they need to add braces too. It&amp;#8217;s less likely to cause bugs down the road, but it should still be avoided, and here&amp;#8217;s why.&lt;/p&gt;
&lt;p&gt;The statement&lt;/p&gt;
&lt;p&gt;&lt;code&gt;if (p.needsGiftWrapping()) wrap(p);&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;is really two statements: one that calls &lt;code&gt;needsGiftWrapping()&lt;/code&gt; and one that calls &lt;code&gt;wrap(p)&lt;/code&gt;. These are independent statements, and may need to eb treated separately. In particular, you may want to mark a breakpoint on one and not the other. For instance, I sometimes like to put a breakpoint in the body of an if or for or while, just to make sure that this code is really being executed when I think it is. If the program doesn&amp;#8217;t stop, then the code isn&amp;#8217;t being executed; and I have a big clue where the bug is. &lt;/p&gt;
&lt;p&gt;This is also important for code coverage tools. Most tools such as Cobertura measure the coverage of lines of code, not statements. Even if they measure lines of code coverage and statement coverage separately (and some tools do this) they still display the coverage with lines of code. This statement can be marked covered even if the body of the block is never entered:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;if (p.needsGiftWrapping()) wrap(p);&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;However, if the block is rewritten like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;if (p.needsGiftWrapping()) {
    wrap(p);
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;it now becomes obvious if the tests are never testing the case where &lt;code&gt;p.needsGiftWrapping()&lt;/code&gt; returns true.  &lt;/p&gt;
&lt;p&gt;This applies equally to all block statements: &lt;code&gt;if&lt;/code&gt;, &lt;code&gt;for&lt;/code&gt;, &lt;code&gt;while&lt;/code&gt;, &lt;code&gt;do while&lt;/code&gt;, and any others you may encounter. I&amp;#8217;m beginning to believe this is actually a special case of a general principle for C-like languages, and perhaps others: &lt;/p&gt;
&lt;h3&gt;Each line of source should contain exactly one statement&lt;/h3&gt;
&lt;p&gt;That is, avoid lines like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;int i = 7, j  = 18;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;or &lt;/p&gt;
&lt;p&gt;&lt;code&gt;int i = j  = 18;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Similarly, avoid the &lt;code&gt;?:&lt;/code&gt; operator. &lt;/p&gt;
&lt;pre&gt;&lt;code&gt;if (a &amp;gt; b) {
  max = a;
}
else {
  max = b;
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt; is easier to read than&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max = a &amp;lt; b ? a : b;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;For instance, did you even notice the bug in the above line? If the verbosity bothers you, try something like this instead:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max = Math.max(a, b);&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The problem may reflect a misdesign in C-family languages. The compiler only pays attention to the semicolons and braces while ignoring the line breaks and indentation, but humans usually only pay attention to the line breaks and indentation while ignoring the semicolons and braces. This gives the code the opportunity to lie about what it&amp;#8217;s really doing. Consequently we need to take extra care when writing in C, Java, C++, C#, etc. not to lie to ourselves. If you place exactly one statement on each source line, you can be reasonably confident the code isn&amp;#8217;t lying to you, and you&amp;#8217;ll have a much easier time debugging.&lt;/p&gt;
&lt;p&gt;Compact code is fun, but it&amp;#8217;s not maximally readable, and more lines don&amp;#8217;t really cost you anything after the compiler is finished anyway.  Save yourself the hassle and for 2009 resolve to put one statement on each line. &lt;/p&gt;
&lt;h3&gt;P.S.&lt;/h3&gt;
&lt;p&gt;If you want to comment on this article, remember that the &lt;code&gt;&amp;lt;pre&gt;&lt;/code&gt; tag is allowed in comments. Otherwise your point may get lost along with your indentation. &lt;img src='http://cafe.elharo.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/xihDOUY6Vx3nfnqJAFqkQ9qVCRY/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/xihDOUY6Vx3nfnqJAFqkQ9qVCRY/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/321UTUyhW2I" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/programming/prefer-multiline-if/#comments" thr:count="18" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/programming/prefer-multiline-if/feed/atom/" thr:count="18" />
		<thr:total>18</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/programming/prefer-multiline-if/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">Monopoly Incompetence</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/kv9W6UWtv_k/" />
		<id>http://cafe.elharo.com/?p=351</id>
		<updated>2008-12-21T09:22:17Z</updated>
		<published>2008-12-21T09:22:17Z</published>
		<category scheme="http://cafe.elharo.com" term="User Interface" />		<summary type="html"><![CDATA[Need more proof that monopolies are bad business? Just try to pay a utility bill online sometime. I have just gotten through attempting to pay my cable, gas, and electric bills online. Exactly none of them offered what I would consider a minimally competent site. The exact problems varied, but there was one that was [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/ui/monopoly-incompetence/">&lt;p&gt;Need more proof that monopolies are bad business? Just try to pay a utility bill online sometime. I have just gotten through attempting to pay my cable, gas, and electric bills online. Exactly none of them offered what I would consider a minimally competent site. The exact problems varied, but there was one that was common across the three. Every single one required registration before they&amp;#8217;d take my money:&lt;/p&gt;
&lt;p&gt;&lt;img src="http://www.elharo.com/blog/wp-content/uploads/2008/12/paybill.png" alt="  Welcome to My Account Online services! Please enter your user ID and password to sign in. New User? Register now. " title="paybill" width="823" height="623" class="size-full wp-image-1001778" /&gt;&lt;/p&gt;
&lt;p&gt;By contrast, non-monopoly sites like Office Depot have long since learned that registration is an &lt;em&gt;optional&lt;/em&gt; step they shouldn&amp;#8217;t let get in the way of completing a sale. But the utility companies? Either they hire developers who are distinctly behind the state of the art, or they just don&amp;#8217;t care because you have to pay them, or both.&lt;br /&gt;
&lt;span id="more-351"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;How easy should it be to pay a utility bill online? It&amp;#8217;s a form with about four or five fields, no more:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Account number&lt;/li&gt;
&lt;li&gt;Credit Card Number&lt;/li&gt;
&lt;li&gt;Expiration Date&lt;/li&gt;
&lt;li&gt;Name on Credit Card&lt;/li&gt;
&lt;li&gt;Amount to pay (this one could even be autofilled based on the account number.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unlike an online store , they shouldn&amp;#8217;t need to ask for a shipping address. Maybe they need a billing address (though that should default to the service address) or the CVV2 code, but utility bills are unlikely to attract the same sort of fraud that online stores do, so I&amp;#8217;m not sure even that&amp;#8217;s necessary. &lt;/p&gt;
&lt;p&gt;They absolutely don&amp;#8217;t need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your e-mail address&lt;/li&gt;
&lt;li&gt;A username&lt;/li&gt;
&lt;li&gt;A password&lt;/li&gt;
&lt;li&gt;Service address&lt;/li&gt;
&lt;li&gt;CAPTCHA (They really think someone&amp;#8217;s going to set up a bot to autopay utility bills?)&lt;/li&gt;
&lt;li&gt;The city where your mother was born. (No I&amp;#8217;m not making that up. Cox really wanted that piece of information.) &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Account number, minimal credit card (or debit card) info, amount to pay. That&amp;#8217;s all.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/n5fMy2pWHsvEqS2xQ03ycomxX6o/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/n5fMy2pWHsvEqS2xQ03ycomxX6o/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/kv9W6UWtv_k" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/ui/monopoly-incompetence/#comments" thr:count="14" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/ui/monopoly-incompetence/feed/atom/" thr:count="14" />
		<thr:total>14</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/ui/monopoly-incompetence/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">Java is Dead! Long Live Python!</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/ujr9-nLvV8k/" />
		<id>http://cafe.elharo.com/?p=326</id>
		<updated>2008-12-09T14:27:01Z</updated>
		<published>2008-12-09T14:27:01Z</published>
		<category scheme="http://cafe.elharo.com" term="Programming" />		<summary type="html"><![CDATA[Version 3.0 of Python has been released. Notably Python has again done something Java has long resisted: it has broken backwards compatibility with Python 2.x. Notable fixes include a much saner string processing model based on Unicode. I am told by my Pythonista colleagues that a lot of other weirdnesses such as the print operator [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/programming/java-is-dead-long-live-python/">&lt;p&gt;&lt;a href="http://www.python.org/download/releases/3.0/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.python.org');"&gt;Version 3.0 of Python has been released&lt;/a&gt;. Notably Python has again done something Java has long resisted: it has broken backwards compatibility with Python 2.x. Notable fixes include a much saner string processing model based on Unicode. I am told by my Pythonista colleagues that a lot of other weirdnesses such as the print operator and the meaning of parentheses in &lt;code&gt;except&lt;/code&gt; clauses have been cleaned up as well. Though I don&amp;#8217;t expect all Python programmers to upgrade immediately (and version 2.x will be maintained for some years to come) version 3.0 is clearly a simpler, better, saner language than version 2.x that will enhance productivity and make programmers&amp;#8217; jobs more fun. Bravo for Python. This is clearly a living, evolving language. &lt;/p&gt;
&lt;p&gt;Java by contrast, is dead. It has at least as much brain damage and misdesign as Python 2.x did, probably more; yet Sun has resisted tooth and nail all efforts to fix the known problems. Instead they keep applying ever more lipstick to this pig without ever cleaning off all the filth and mud it&amp;#8217;s been rolling in for the last 12 years. They keep applying more perfume when what it really needs is a bath.&lt;br /&gt;
&lt;span id="more-326"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Backwards compatibility was maintainable and useful through about version 1.4 of Java, but it completely broken down in Java 5 when autoboxing and generics moved the core language beyond any hope of comprehensibility. Autoboxing was a misguided effort to paper over Java&amp;#8217;s early decision to have a segregated type system for primitives and objects. It was Java&amp;#8217;s Plessy v. Ferguson decision that pretended primitives and objects were separate but equal; but the claim was no more true in Java than it was in American jurisprudence. A separate primitive type system may have made sense in 1995 when CPUs were slower and virtual machine technology was not as advanced. Today primitive types just complexify the language to no particular benefit. Autoboxing would not have been necessary or even considered were backwards compatibility not worshipped beyond all other gods. &lt;/p&gt;
&lt;p&gt;Generics are another case where backwards compatibility took a good idea and warped it into something horrible. Multiple design compromises were made to enable genericized code to run in older VMs, most notably type erasure.  Then, in the end, binary compatibility was broken anyway. However no one went back to the drawing board and considered how much simpler &lt;em&gt;and&lt;/em&gt; more powerful generics could be if they redesigned without worrying about backwards binary compatibility.&lt;/p&gt;
&lt;p&gt;Closures, if added now, would only make the situation worse. Closures might be a nice addition to the language if &lt;em&gt;and only if&lt;/em&gt; Java simultaneously removed inner classes and made all other syntactic changes necessary to support true closures. Otherwise closures will just be generics squared.  New features simply cannot be added on top of the current weak foundation unless we&amp;#8217;re willing to go back to the drawing board and take things out as well. &lt;/p&gt;
&lt;p&gt;I can&amp;#8217;t think of another major language as old as Java that still attempts to maintain compatibility with version 1.0 of itself. In fact, I can think of only one language that attempts that (C#), and that one&amp;#8217;s half Java&amp;#8217;s age. Unless we&amp;#8217;re willing to make the hard choices and abandon the legacy as Python has, Java is doomed to the fate of C++ and Cobol: a tool for programmers with long white beards who grew up with the language and have learned all its arcana by gradual accretion and who spend their lives maintaining code written a decade or more ago. Meanwhile a new generation of programmers will abandon Java in favor of more nimble modern languages like Python just as we abandoned C++ in our youth in favor of Java. (Seriously: is anyone under the age of 30 actually reading this site any more?)&lt;/p&gt;
&lt;p&gt;Admitting that you have a problem is the first step to recovery. Java has not yet admitted that it has a problem. The language is too big, too complex, and too baroque. Trade-offs that made sense in the era of single core Pentium II&amp;#8217;s, 100Mhz processors, and 32 megabyte memory spaces no longer apply. Backwards compatibility has become a millstone around Java&amp;#8217;s neck. We&amp;#8217;re deep  and sinking fast. Until this millstone is cast off, and we correct the mistakes of the past, no further progress can be made. &lt;/p&gt;
&lt;p&gt;It&amp;#8217;s hard to believe that I first started saying this &lt;a href="http://www.onjava.com/pub/a/onjava/2002/07/31/java3.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.onjava.com');"&gt;over five years ago&lt;/a&gt;, and absolutely no progress has been made in that entire time. In fact, matters have gotten worse.  Maybe Java is a lost cause, and it&amp;#8217;s time to fork and replace the language. If nothing else, Java proved that&amp;#8217;s possible. Just look what it did to C++. Perhaps it&amp;#8217;s time to repeat the experience. &lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/xHiY6P5MX9ls-cLfjk9elvoRMvs/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/xHiY6P5MX9ls-cLfjk9elvoRMvs/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/ujr9-nLvV8k" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/programming/java-is-dead-long-live-python/#comments" thr:count="48" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/programming/java-is-dead-long-live-python/feed/atom/" thr:count="48" />
		<thr:total>48</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/programming/java-is-dead-long-live-python/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">Keep Your Methods Private and your APIs Minimal</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/JvEXaKEN5jQ/" />
		<id>http://cafe.elharo.com/?p=323</id>
		<updated>2008-11-11T15:53:11Z</updated>
		<published>2008-11-11T15:53:11Z</published>
		<category scheme="http://cafe.elharo.com" term="Programming" />		<summary type="html"><![CDATA[I was reminded once more today just how important it is to write minimal APIs that don&#8217;t expose more than they have to. Briefly I had code like this:
 private boolean flag;

 public boolean getFlag() {
   return this.flag;
 }

  public boolean setFlag(boolean value);
    this.flag = value;
  }
Pretty boilerplate [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/programming/keep-your-methods-private-and-your-apis-minimal/">&lt;p&gt;I was reminded once more today just how important it is to write minimal APIs that don&amp;#8217;t expose more than they have to. Briefly I had code like this:&lt;/p&gt;
&lt;pre&gt; private boolean flag;

 public boolean getFlag() {
   return this.flag;
 }

  public boolean setFlag(boolean value);
    this.flag = value;
  }&lt;/pre&gt;
&lt;p&gt;Pretty boilerplate stuff, I think you&amp;#8217;ll agree.&lt;/p&gt;
&lt;p&gt;However I noticed that after some refactoring that merged a couple of classes I was now only calling &lt;code&gt;getFoo()&lt;/code&gt; from within the same class (or at least I thought I was) so I marked it private. Eclipse promptly warned me that the method was unused so I deleted it. Then Eclipse warned me the field was unread. That seemed wrong so I looked closer and yep, it was a bug. The feature the flag was supposed to control was always on. During the refactoring I had failed to move the use of the &lt;code&gt;flag&lt;/code&gt; field into the new class. I added a test to catch this, and fixed the problem.&lt;/p&gt;
&lt;p&gt;What&amp;#8217;s interesting about this example is that I found the bug only because I was aggressively minimizing the non-private parts of my API. The less public API a class has, the fewer places there are for bugs to hide. The less public API there is, the easier it is for analyzers&amp;#8211;static, dynamic, and human&amp;#8211;to detect problems.&lt;br /&gt;
&lt;span id="more-323"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Many programmers subscribe to a cult of extensibility: Extensibility is always good. Opening up the API makes it more testable. Compile time static type checking really doesn&amp;#8217;t help anyway. You might need it someday so put it in now. Maybe you don&amp;#8217;t need it, but someone else might.&lt;/p&gt;
&lt;p&gt;This way lies madness. Most extensibility points are never used. How many getters and setters are actually invoked? Not as many as you&amp;#8217;d expect, and even fewer if test classes are not considered. How many non-final classes in your code actually have subclasses? How many of these classes have actually been designed and documented for extensibility? How many non-final fields actually are mutated after construction? Certainly some are. Not all methods and fields can or should be final. Some classes are designed to be subclassed. Some fields do need getters or less frequently setters. But none of these features should be added to your classes out of habit. &lt;/p&gt;
&lt;p&gt;For maximum safety, remove as much as you can and lock down what you can&amp;#8217;t. Make classes (and/or methods) final by default. Make fields final and immutable. Don&amp;#8217;t mark a method public if it can be package protected instead. Don&amp;#8217;t routinely add getters and setters for each field; and if you do need a getter or setter, just add the one you need, not both. In this case, I needed a setter, but the getter was gratuitous and removing it revealed a real bug. &lt;/p&gt;
&lt;p&gt;Follow the YAGNI principle: You Ain&amp;#8217;t Gonna Need It. Never add an extension or access point&amp;#8211;be it a method, a field, a non-final class, etc.&amp;#8211;until you know you need it. This will improve your code&amp;#8217;s robustness, thread safety, security, speed, and more.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/-f1XNeJ2Bm_A7K14HWOoRd766-4/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/-f1XNeJ2Bm_A7K14HWOoRd766-4/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/JvEXaKEN5jQ" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/programming/keep-your-methods-private-and-your-apis-minimal/#comments" thr:count="10" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/programming/keep-your-methods-private-and-your-apis-minimal/feed/atom/" thr:count="10" />
		<thr:total>10</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/programming/keep-your-methods-private-and-your-apis-minimal/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">Harold’s Corollary to Knuth’s Law</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/cQWDIvvgaUk/" />
		<id>http://cafe.elharo.com/?p=246</id>
		<updated>2008-12-09T14:28:09Z</updated>
		<published>2008-08-05T15:12:02Z</published>
		<category scheme="http://cafe.elharo.com" term="Testing" />		<summary type="html"><![CDATA[Lately I&#8217;ve found myself arguing about the proper design of unit tests. On my side I&#8217;m claiming:

Unit tests should only touch the public API. 
Code coverage should be as near 100% as possible.
It&#8217;s better to test the real thing than mock objects.

The goal is to make sure that the tests are as close to actual [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/testing/harolds-corollary-to-knuths-law/">&lt;p&gt;Lately I&amp;#8217;ve found myself arguing about the proper design of unit tests. On my side I&amp;#8217;m claiming:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Unit tests should only touch the public API. &lt;/li&gt;
&lt;li&gt;Code coverage should be as near 100% as possible.&lt;/li&gt;
&lt;li&gt;It&amp;#8217;s better to test the real thing than mock objects.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The goal is to make sure that the tests are as close to actual usage as possible. This means that  problems are more likely to be detected and false positives are less likely. Furthermore, the discipline of testing through the public API when attempting to achieve 100% code coverage tends to reveal a lot about how the code really works. It routinely highlights dead code that can be eliminated. It reveal paths of optimization. It teaches me things about my own code I didn&amp;#8217;t know. It shows patterns in the entire system that makes up my product.&lt;/p&gt;
&lt;p&gt;By contrast some programmers advocate that tests should be method-limited. Each test should call the method as directly as possible, perhaps even making it public or non-private and violating encapsulation to enable this. Any external resources that are necessary to run the method such as databases or web servers should be mocked out. At the extreme, even other classes a test touches should be replaced by mock implementations.&lt;/p&gt;
&lt;p&gt;This approach may sometimes let the tests be written faster; but not always. There&amp;#8217;s a non-trivial cost to designing mock objects to replace the real thing; and sometimes that takes longer. This approach will still tend to find most bugs in the method being tested. However it stops there. It will not find code in the method that should be eliminated because it&amp;#8217;s unreachable from the public API. Thus code tested with this approach is likely to be larger, more complex, and slower since it has to handle conditions that can&amp;#8217;t happen through the public API. More importantly, such a test starts and stops with that one method. It reveals nothing about the interaction of the different parts of the system. It teaches nothing about how the code really operates in the more complex environment of the full system. It misses bugs that can emerge out of the mixture of multiple different methods and classes even when each method is behaving correctly in isolation according to it spec. That is, it often fails to find flaws in the specifications of the individual methods. Why then are so many programmers so adamant about breaking access protection and every other rule of good design as soon as they start testing?&lt;/p&gt;
&lt;p&gt;Would you believe performance?&lt;br /&gt;
&lt;span id="more-246"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;For instance consider &lt;a href="http://www.artima.com/weblogs/viewpost.jsp?thread=126923" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.artima.com');"&gt;this proposal&lt;/a&gt; from Michael Feathers:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A test is not a unit test if:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;i&gt;It talks to the database&lt;br /&gt;
&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;It communicates across the network&lt;br /&gt;
&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;It touches the file system&lt;br /&gt;
&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;It can&amp;#8217;t run at the same time as any of your other unit tests&lt;br /&gt;
&lt;/i&gt;&lt;/li&gt;
&lt;li&gt;&lt;i&gt;You have to do special things to your environment (such as editing&lt;br /&gt;
config files) to run it.&lt;/i&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tests that do these things aren&amp;#8217;t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;More than 30 years ago Donald Knuth first published what would come to be called Knuth&amp;#8217;s law: &amp;#8220;premature optimization is the root of all evil in programming.&amp;#8221; &lt;cite&gt;(Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.)&lt;/cite&gt; But some developers still haven&amp;#8217;t gotten the message. &lt;/p&gt;
&lt;p&gt;Are there some tests that are so slow they contribute to not running the test suite? Yes. We&amp;#8217;ve all seen them, but there&amp;#8217;s no way to tell which tests they are in advance. In my test suite for &lt;a href="http://www.xom.nu/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.xom.nu');"&gt;XOM&lt;/a&gt;, I have numerous tests that communicate across the network,  touch the filesystem, and access third party libraries. However, almost all these tests run like a bat out of hell, and take no noticeable time. The &lt;a href="https://xom.dev.java.net/source/browse/xom/src/nu/xom/tests/EncodingTest.java?rev=1.27&amp;amp;view=log" onclick="javascript:pageTracker._trackPageview('/outbound/article/xom.dev.java.net');"&gt;slowest test in the suite&lt;/a&gt;? It&amp;#8217;s one that operates completely in memory on byte array streams with no network access, does not touch the file system, uses no APIs beyond what&amp;#8217;s in Java 1.2 and XOM itself, and there&amp;#8217;s no database anywhere in sight. I do omit that test from my standard suite because it takes too long to run. I&amp;#8217;ll run it explicitly once or twice before releasing a new version, but not every time I make a change. &lt;/p&gt;
&lt;p&gt;I am now proposing Harold&amp;#8217;s corollary to Knuth&amp;#8217;s law: &lt;em&gt;premature optimization is the root of all evil in testing&lt;/em&gt;. It is absolutely essential to make sure that your test suite runs fast enough to run after every change to the code and before every check in. I&amp;#8217;m even willing to put a number on &amp;#8220;fast enough&amp;#8221;, and that number is &lt;a href="http://cafe.elharo.com/ui/the-90-second-rule/" &gt;90 seconds&lt;/a&gt;. However, you simply cannot tell which tests are likely to be too slow to run routinely in advance of actual measurement. Castrating and contorting your tests to fit some imagined idea of what will and will not be slow limits their usefulness. &lt;/p&gt;
&lt;p&gt;Tests should be designed for the ideal scenario: a computer that is infinitely fast with infinite memory and a network with zero latency and infinite bandwidth. Of course, that ideal computer doesn&amp;#8217;t exist, and you&amp;#8217;ll have to profile, optimize, and as a last resort cut back on your tests. However, I&amp;#8217;ve never yet met a programmer who could reliably tell which tests (or other code) would and would not be fast enough in advance of actual measurements. Blanket rules that unit tests should not do X or talk to Y because it&amp;#8217;s likely to be slow needlessly limits what we can learn from unit tests.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/QLGzhh8QEhJwXA8dT4yLKuzMfFw/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/QLGzhh8QEhJwXA8dT4yLKuzMfFw/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/cQWDIvvgaUk" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/testing/harolds-corollary-to-knuths-law/#comments" thr:count="15" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/testing/harolds-corollary-to-knuths-law/feed/atom/" thr:count="15" />
		<thr:total>15</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/testing/harolds-corollary-to-knuths-law/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">10 Things to Know Before You Go To Beijing</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/8BNq1N8MVMI/" />
		<id>http://cafe.elharo.com/?p=295</id>
		<updated>2008-08-04T14:13:08Z</updated>
		<published>2008-08-04T14:13:08Z</published>
		<category scheme="http://cafe.elharo.com" term="Travel" />		<summary type="html"><![CDATA[1. Learn Mandarin. 
Even a little will go a long way. English is very uncommon here. All those tourist phrase books and Berlitz courses that did you absolutely no good in Europe because everyone spoke English? They actually help here. The most important phrase to know is &#8220;Boo-yao&#8221; which loosely translates as &#8220;No, I don&#8217;t [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/travel/10-things-to-know-before-you-go-to-beijing/">&lt;h3&gt;1. Learn Mandarin. &lt;/h3&gt;
&lt;p&gt;Even a little will go a long way. English is very uncommon here. All those tourist phrase books and Berlitz courses that did you absolutely no good in Europe because everyone spoke English? They actually help here. The most important phrase to know is &amp;#8220;Boo-yao&amp;#8221; which loosely translates as &amp;#8220;No, I don&amp;#8217;t want that cheap plastic souvenir/guide book/Rolex/Gucci bag you&amp;#8217;re trying sell me, and I really mean it.&amp;#8221;&lt;br /&gt;
&lt;span id="more-295"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;2. Money&lt;/h3&gt;
&lt;p&gt;China is a cash society. Credit cards are rarely accepted, not even in restaurants, large department stores, or major tourist destinations. &lt;/p&gt;
&lt;p&gt;Change your money at the airport. It&amp;#8217;s relatively hard to do afterwards. &lt;/p&gt;
&lt;h3&gt;3. Where to Stay&lt;/h3&gt;
&lt;p&gt;Stay within the 2nd Ring Road unless you have a specific reason to be elsewhere. (The Olympics are mostly outside the Fourth Ring Road.) Beijing is a large city with a lot of traffic. It can take a while to get around.&lt;/p&gt;
&lt;p&gt;Chinese hotels (that is, ones that cater to natives instead of foreign tourists and businesspeople) are not the same as North American hotels. Worse in some ways, better in others; but most U.S. travelers will be uncomfortable.&lt;/p&gt;
&lt;h3&gt;4. Food&lt;/h3&gt;
&lt;p&gt;Learn to eat with chopsticks; or, if you must, bring plastic forks. &lt;/p&gt;
&lt;p&gt;I got conflicting advice about whether to drink the water, but I didn&amp;#8217;t risk it. Bottled water is cheap and easy to find.&lt;/p&gt;
&lt;p&gt;If you do drink the water, there are public toilets everywhere. Learn to squat. Bring toilet paper because most bathrooms don&amp;#8217;t have any. &lt;/p&gt;
&lt;h3&gt;5. Transport&lt;/h3&gt;
&lt;p&gt;Cabs are plentiful and cheap, but the drivers don&amp;#8217;t speak English. Write down your destination in Chinese characters, or ask someone at the hotel to do it for you. Many guide books list addresses in both English and Chinese characters so if necessary you can get by with pointing. &lt;/p&gt;
&lt;p&gt;Make sure to take one of the Yellow cabs. (Really yellow and brown, yellow and green, or yellow and red). These are official cabs with meters. Other &amp;#8220;cabs&amp;#8221; are usually private cars out to make a few bucks from the tourists and will cost you a lot more (though still not as much as an equivalent ride in new York or London, I&amp;#8217;m compelled to note.)&lt;/p&gt;
&lt;p&gt;The subway is easy to navigate and by far the fastest way to get around town. Signs and announcements are in both Chinese and English. Choose a hotel within easy walking distance of a subway station.&lt;/p&gt;
&lt;p&gt;At rush hour, if you&amp;#8217;re not near a subway, it may well be faster to take a cab to the nearest subway station, take the subway across town, and then take another cab from there. &lt;/p&gt;
&lt;p&gt;Buses are cheap, crowded, and Chinese. The driver does not speak English.&lt;/p&gt;
&lt;p&gt;Parks and major destinations like the Forbidden City and the Temple of Heaven are walled. You must enter them at exactly the right place or you&amp;#8217;ll have a long walk around.&lt;/p&gt;
&lt;h3&gt;6. Crime&lt;/h3&gt;
&lt;p&gt;There isn&amp;#8217;t any, at least not of the low-level street crime that would inconvenience tourists. Hutongs are safe, even if they look otherwise.&lt;/p&gt;
&lt;h3&gt;7. Making Hello&lt;/h3&gt;
&lt;p&gt;The Asian girl approaching you with a camera wants to take her picture with you. They call this &amp;#8220;making hello&amp;#8221;. This is not a scam. Caucasians and other non-Orientals are still uncommon enough in most Beijing neighborhoods (with the possible exception of Sanlitun) that the locals are curious. &lt;/p&gt;
&lt;h3&gt;8. Nature&lt;/h3&gt;
&lt;p&gt;There&amp;#8217;s very little green space within Beijing or close to it, compared to New York or many other Western cities. However, you can find some birds, trees, and animals at The Temple of Heaven Park, the Old and New Summer Palaces, and the Beijing Zoo. &lt;/p&gt;
&lt;p&gt;The most common birds in the city are Rock Pigeon, European Tree Sparrow (one of the infamous 4 pests&amp;#8211;they&amp;#8217;re still here. Mao is gone.) and Black-billed Magpie.&lt;/p&gt;
&lt;h3&gt;9. Shopping&lt;/h3&gt;
&lt;p&gt;Everything is cheap, cheap, cheap except for electronics which cost about the same in China as they  do in the U.S. Clothing and food is especially cheap. Prices are usually well marked, and if they&amp;#8217;re not, chances are pretty damn good you can afford it anyway. Outside the tourist shops, merchants seem honest and no one is trying to rip you off. Inside the tourist shops/areas, it&amp;#8217;s a different story; but prices here are very much negotiable, even on small items like a soda. &lt;/p&gt;
&lt;p&gt;The malls and a few tourist areas like the Great Wall can be somewhat pressured. Just keep saying Boo-yao, smile, and walk on. However the smaller shops in the less touristy areas are  genuinely interested in seeing you; and welcome any opportunity to interact, language difficulties notwithstanding. If necessary, draw pictures and point. &lt;/p&gt;
&lt;h3&gt;10. Leaving town&lt;/h3&gt;
&lt;p&gt;Do not take &lt;em&gt;any&lt;/em&gt; liquids, gels, pastes, waxes, soap, or makeup of any kind in your carry-on luggage when leaving China. Not even in 3 oz bottles in a quart sized ziploc bag. They will be confiscated.&lt;/p&gt;
&lt;p&gt;Don&amp;#8217;t get to the airport more than 3.5 hours before your flight leaves. You can&amp;#8217;t check in. &lt;/p&gt;
&lt;p&gt;Airport prices return to U.S. levels. This can be a bit of a shock after enjoying the bargain that is most of Beijing. &lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/v-8Zqu5ZouzYqnQ6Y6rpt7zrkvU/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/v-8Zqu5ZouzYqnQ6Y6rpt7zrkvU/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/8BNq1N8MVMI" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/travel/10-things-to-know-before-you-go-to-beijing/#comments" thr:count="6" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/travel/10-things-to-know-before-you-go-to-beijing/feed/atom/" thr:count="6" />
		<thr:total>6</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/travel/10-things-to-know-before-you-go-to-beijing/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">Upgraded to Wordpress 2.6</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/eoCdGqXhHKk/" />
		<id>http://cafe.elharo.com/?p=292</id>
		<updated>2008-08-03T02:02:39Z</updated>
		<published>2008-08-03T02:01:32Z</published>
		<category scheme="http://cafe.elharo.com" term="Web Development" />		<summary type="html"><![CDATA[I&#8217;ve upgraded this site to WordPress 2.6. Please holler if you notice any problems. Thanks.
]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/web/upgraded-to-wordpress-26/">&lt;p&gt;I&amp;#8217;ve upgraded this site to WordPress 2.6. Please holler if you notice any problems. Thanks.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/nmViLndw6GP9JWsSw19pcJvCfkY/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/nmViLndw6GP9JWsSw19pcJvCfkY/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/eoCdGqXhHKk" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/web/upgraded-to-wordpress-26/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/web/upgraded-to-wordpress-26/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/web/upgraded-to-wordpress-26/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">Chapter 3: Well-formedness</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/1pb5w-5x-CA/" />
		<id>http://cafe.elharo.com/web/refactoring-html/chapter-3-well-formedness/</id>
		<updated>2008-07-14T13:36:50Z</updated>
		<published>2008-07-14T13:36:50Z</published>
		<category scheme="http://cafe.elharo.com" term="Refactoring HTML" />		<summary type="html"><![CDATA[Here&#8217;s part 15 of the ongoing serialization of Refactoring HTML, also available from Amazon  and Safari.
The very first step in moving markup into modern form is to make it well-formed. Well-formedness is the basis of the huge and incredibly powerful XML tool chain. Well-formedness guarantees a single unique tree structure for the document that [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/web/refactoring-html/chapter-3-well-formedness/">&lt;p&gt;&lt;i&gt;Here&amp;#8217;s part 15 of the ongoing serialization of &lt;cite&gt;Refactoring HTML&lt;/cite&gt;, also available from &lt;a href="http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;Amazon&lt;/a&gt;  and &lt;a href="http://safari.oreilly.com/9780321552044" onclick="javascript:pageTracker._trackPageview('/outbound/article/safari.oreilly.com');"&gt;Safari&lt;/a&gt;.&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;The very first step in moving markup into modern form is to make it well-formed. Well-formedness is the basis of the huge and incredibly powerful XML tool chain. Well-formedness guarantees a single unique tree structure for the document that can be operated on by the DOM, thus making it the basis of reliable, cross-browser JavaScript. The very first thing you need to do is make your pages well-formed.&lt;/p&gt;
&lt;p&gt;Validity, although important, is not nearly as crucial as well-formedness. There are often good reasons to compromise on validity. In fact, I often deliberately publish invalid pages. If I need an element the DTD doesn’t allow, I put it in. It won’t hurt anything because browsers ignore elements they don’t understand. If I have a &lt;code&gt;blockquote&lt;/code&gt; that contains raw text but no elements, no great harm is done. If I use an HTML 5 element such as m that Opera recognizes and other browsers don’t, those other browsers will just ignore it. However, if the page is malformed, the consequences are much more severe.&lt;/p&gt;
&lt;p&gt;First, I won’t be able to use any XML tools, such as XSLT or SAX, to process the page. Indeed, almost the only thing I can do with it is view it in a browser. It is very hard to do any reliable automated processing or testing with a malformed page.&lt;/p&gt;
&lt;p&gt;Second, browser display becomes much more unpredictable. Different browsers fill in the missing pieces and correct the mistakes of malformed pages in different ways. Writing cross-platform JavaScript or CSS is hard enough without worrying about what tree each browser will construct from ambiguous HTML. Making the page well-formed makes it a lot more likely that I can make it behave as I like across a wide range of browsers.&lt;br /&gt;
&lt;span id="more-289"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;What Is Well-formedness?&lt;br /&gt;
&lt;/h3&gt;
&lt;p&gt;Well-formedness is a concept that comes from XML. Technically, it means that a document adheres to certain rigid constraints, such as every start-tag has a matching end-tag, elements must begin and end in the same parent element, and every entity reference is defined.&lt;/p&gt;
&lt;p&gt;Classic HTML is based on SGML, which allows a lot more leeway than does XML. For example, in HTML and SGML, it’s perfectly OK to have a &amp;lt;br&gt; or &amp;lt;li&gt; tag with no corresponding &amp;lt;/br&gt; and &amp;lt;/li&gt; tags. However, this is no longer allowed in a well-formed document.&lt;/p&gt;
&lt;p&gt;Well-formedness ensures that every conforming processor treats the document in the same way at a low level. For example, consider this malformed fragment:&lt;/p&gt;
&lt;pre&gt;
&amp;lt;p&gt;The quick &amp;lt;strong&gt;brown fox&amp;lt;/p&gt;
jumped over the
&amp;lt;p&gt;lazy&amp;lt;/strong&gt; dog.&amp;lt;/p&gt;
&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;strong&lt;/code&gt; element begins in one paragraph and ends in the next. Different browsers can and do build different internal representations of this text. For example, Firefox and Safari fill in the missing start- and end-tags (including those between the paragraphs). In essence, they treat the preceding fragment as equivalent to this markup:&lt;/p&gt;
&lt;pre&gt;
&amp;lt;p&gt;The quick &amp;lt;strong&gt;brown fox&amp;lt;/strong&gt;&amp;lt;/p&gt;
&amp;lt;strong&gt;jumped over the &amp;lt;/strong&gt;
&amp;lt;p&gt;&amp;lt;strong&gt;lazy&amp;lt;/strong&gt; dog.&amp;lt;/p&gt;
&lt;/pre&gt;
&lt;p&gt;This creates the tree shown in Figure 3.1.&lt;/p&gt;
&lt;p&gt;&lt;img src='http://cafe.elharo.com/wp-content/uploads/2008/07/03fig01.png' alt='03fig01.png' /&gt;&lt;/p&gt;
&lt;p&gt;Figure 3.1: An overlapping tree as interpreted by Firefox and Safari&lt;/p&gt;
&lt;p&gt;By contrast, Opera places the second &lt;code&gt;p&lt;/code&gt; element inside the &lt;code&gt;strong&lt;/code&gt; element which is inside the first p element. In essence the Opera DOM treats the fragment as equivalent to this markup:&lt;/p&gt;
&lt;pre&gt;
&amp;lt;p&gt;The quick
&amp;lt;strong&gt;brown fox jumped over the
&amp;lt;p&gt;lazy dog.&amp;lt;/p&gt;
&amp;lt;/strong&gt;
&amp;lt;/p&gt;
&lt;/pre&gt;
&lt;p&gt;This builds the tree shown in Figure 3.2.&lt;/p&gt;
&lt;p&gt;&lt;img src='http://cafe.elharo.com/wp-content/uploads/2008/07/03fig02.png' alt='03fig02.png' /&gt;&lt;/p&gt;
&lt;p&gt;Figure 3.2: An overlapping tree as interpreted by Opera&lt;/p&gt;
&lt;p&gt;If you’ve ever struggled with writing JavaScript code that works the same across browsers, you know how annoying these cross-browser idiosyncrasies can be.&lt;/p&gt;
&lt;p&gt;By contrast, a well-formed document removes the ambiguity by requiring all the end-tags to be filled in and all the elements to have a single unique parent. Here is the well-formed markup corresponding to the preceding code:&lt;/p&gt;
&lt;pre&gt;
&amp;lt;p&gt;…foo&amp;lt;strong&gt;…&amp;lt;/strong&gt;&amp;lt;/p&gt; &amp;lt;p&gt;&amp;lt;strong&gt;…bar&amp;lt;/strong&gt; &amp;lt;/p&gt;
&lt;/pre&gt;
&lt;p&gt;This leaves no room for browser interpretation. All modern browsers build the same tree structure from this well-formed markup. They may still differ in which methods they provide in their respective DOMs, and in other aspects of behavior, but at least they can agree on what’s in the HTML document. That’s a huge step forward.&lt;/p&gt;
&lt;p&gt;Anything that operates on an HTML document, be it a browser, a CSS stylesheet, an XSL transformation, a JavaScript program, or something else, will have an easier time working with a well-formed document than the malformed alternative. For many use cases such as XSLT, this may be critical. An XSLT processor will simply refuse to operate on malformed input. You must make the document well-formed before you can apply an XSLT stylesheet to it.&lt;/p&gt;
&lt;p&gt;Most web sites will need to make at least some and possibly all of the following fixes to become well-formed.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Every start-tag must have a matching end-tag.&lt;/li&gt;
&lt;li&gt;Empty elements should use the empty-element tag syntax.&lt;/li&gt;
&lt;li&gt;Every attribute must have a value.&lt;/li&gt;
&lt;li&gt;Every attribute value must be quoted.&lt;/li&gt;
&lt;li&gt;Every raw ampersand must be escaped as &amp;amp;amp;.&lt;/li&gt;
&lt;li&gt;Every raw less-than sign must be escaped as &amp;amp;lt;.&lt;/li&gt;
&lt;li&gt;There must be a single root element.&lt;/li&gt;
&lt;li&gt;Every nonpredefined entity reference must be declared in the DTD.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition, namespace well-formedness requires that you add an &lt;code&gt;xmlns="http://www.w3.org/1999/xhtml"&lt;/code&gt; attribute to the root &lt;code&gt;html&lt;/code&gt; element.&lt;/p&gt;
&lt;p&gt;Although it’s easy to find and fix some of these problems manually, you’re unlikely to catch all of them without help. As discussed in the preceding chapter, you can use xmllint or other validators to check for well-formedness. For example:&lt;/p&gt;
&lt;pre&gt;
$ xmllint --noout --loaddtd http://www.aw.com
http://www.aw-bc.com/:118: parser error : Specification
mandate value for attribute nowrap
&amp;lt;TD class="headerBg" bgcolor="#004F99" nowrap align="left"&gt;
^
http://www.aw-bc.com/:118: parser error : attributes construct error
&amp;lt;TD class="headerBg" bgcolor="#004F99" nowrap align="left"&gt;
^
http://www.aw-bc.com/:118: parser error : Couldn't find end
of Start-tag TD line 118
&amp;lt;TD class="headerBg" bgcolor="#004F99" nowrap align="left"&gt;
^
…
&lt;/pre&gt;
&lt;p&gt;TagSoup or Tidy can handle many of the necessary fixes automatically. However, they don’t always guess right, so it pays to at least spot-check some of the problems manually before fixing them. Usually it’s simplest to fix as many broad classes of errors as possible. Then run xmllint again to see what you’ve missed.&lt;/p&gt;
&lt;p&gt;The following sections discuss the mechanics and trade-offs of each of these changes, as they usually apply in HTML.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/7CZmgEIQjHjJBG6vjd-bEXOvx0E/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/7CZmgEIQjHjJBG6vjd-bEXOvx0E/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/1pb5w-5x-CA" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/web/refactoring-html/chapter-3-well-formedness/#comments" thr:count="0" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/web/refactoring-html/chapter-3-well-formedness/feed/atom/" thr:count="0" />
		<thr:total>0</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/web/refactoring-html/chapter-3-well-formedness/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">XSLT</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/NlCxXDEQg3k/" />
		<id>http://cafe.elharo.com/web/refactoring-html/xslt/</id>
		<updated>2008-07-14T13:38:14Z</updated>
		<published>2008-07-03T12:42:57Z</published>
		<category scheme="http://cafe.elharo.com" term="Refactoring HTML" />		<summary type="html"><![CDATA[Here&#8217;s part 14 of the ongoing serialization of Refactoring HTML, also available from Amazon  and Safari.
XSLT (Extensible Stylesheet Language Transformations) is one of many XML tools that work well on HTML documents once they have first been converted into well-formed XHTML. In fact, it is one of my favorite such tools, and the first [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/web/refactoring-html/xslt/">&lt;p&gt;&lt;i&gt;Here&amp;#8217;s part 14 of the ongoing serialization of &lt;cite&gt;Refactoring HTML&lt;/cite&gt;, also available from &lt;a href="http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;Amazon&lt;/a&gt;  and &lt;a href="http://safari.oreilly.com/9780321552044" onclick="javascript:pageTracker._trackPageview('/outbound/article/safari.oreilly.com');"&gt;Safari&lt;/a&gt;.&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;XSLT (Extensible Stylesheet Language Transformations) is one of many XML tools that work well on HTML documents once they have first been converted into well-formed XHTML. In fact, it is one of my favorite such tools, and the first thing I turn to for many tasks. For instance, I use it to automatically generate a lot of content, such as RSS and Atom feeds, by screen-scraping my HTML pages. Indeed, the possibility of using XSLT on my documents is one of my main reasons for refactoring documents into well-formed XHTML. XSLT can query documents for things you need to fix and automate some of the fixes.&lt;br /&gt;
&lt;span id="more-287"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;When refactoring XHTML with XSLT, you usually leave more alone than you change. Thus, most refactoring stylesheets start with the identity transformation shown in Listing 2.9.&lt;/p&gt;
&lt;p&gt;Listing 2.9: The Identity Transformation in XSLT&lt;/p&gt;
&lt;pre&gt;
&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'&gt;
  xmlns:html='http://www.w3.org/1999/xhtml'
  xmlns='http://www.w3.org/1999/xhtml'
  exclude-result-prefixes='html'&gt;

  &amp;lt;xsl:template match="@*|node()"&gt;
    &amp;lt;xsl:copy&gt;
      &amp;lt;xsl:apply-templates select="@*|node()"/&gt;
    &amp;lt;/xsl:copy&gt;
  &amp;lt;/xsl:template&gt;

&amp;lt;/xsl:stylesheet&gt;&lt;/code&gt;
&lt;/pre&gt;
&lt;p&gt;This merely copies the entire document from the input to the output. You then modify this basic stylesheet with a few extra rules to make the changes you desire. For example, suppose you want to change all the deprecated &lt;code&gt;&amp;lt;i&gt;&lt;/code&gt; elements to &lt;code&gt;&amp;lt;em&gt;&lt;/code&gt; elements. You would add this rule to the stylesheet:&lt;/p&gt;
&lt;pre&gt;
&lt;code&gt;&amp;lt;xsl:template match='html:i'&gt;
  &amp;lt;em&gt;
    &amp;lt;xsl:apply-templates select="@*|node()"/&gt;
  &amp;lt;/em&gt;
&amp;lt;/xsl:template&gt;&lt;/code&gt;
&lt;/pre&gt;
&lt;p&gt;Notice that the XPath expression in the match attribute must use a namespace prefix, even though the element it’s matching uses the default namespace. This is a common source of confusion when transforming XHTML documents. You always have to assign the XHTML namespace a prefix when you’re using it in an XPath expression.&lt;/p&gt;
&lt;h3&gt;Note&lt;/h3&gt;
&lt;p&gt;Several good introductions to XSLT are available in print and on the Web. First, I’ll recommend two I’ve written myself. Chapter 15 of &lt;cite&gt;&lt;a href="http://www.amazon.com/exec/obidos/ISBN=0764549863/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;The XML 1.1 Bible&lt;/a&gt;&lt;/cite&gt; (Wiley, 2003) covers XSLT in depth, and is available on the Web at &lt;a href="http://www.cafeconleche.org/books/bible3/chapters/ch15.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.cafeconleche.org');"&gt;http://www.cafeconleche.org/books/bible3/chapters/ch15.html&lt;/a&gt;. &lt;cite&gt;&lt;a href="http://www.amazon.com/exec/obidos/ISBN=0596007647/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;XML in a Nutshell, 3rd Edition&lt;/a&gt;&lt;/cite&gt;, by Elliotte Harold and W. Scott Means (O’Reilly, 2004), provides a somewhat more concise introduction. Finally, if you want the most comprehensive coverage available, I recommend Michael Kay’s &lt;cite&gt;&lt;a href="http://www.amazon.com/exec/obidos/ISBN=0764543814/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;XSLT: Programmer’s Reference&lt;/a&gt;&lt;/cite&gt; (Wrox, 2001) and &lt;cite&gt;&lt;a href="http://www.amazon.com/exec/obidos/ISBN=0764569090/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;XSLT 2.0: Programmer’s Reference&lt;/a&gt;&lt;/cite&gt; (Wrox, 2004).&lt;/p&gt;
&lt;p&gt;&lt;i&gt;This concludes Chapter 2. I&amp;#8217;ll probably post a couple more sections from Chapter 3. Then if you want to see what comes next, you&amp;#8217;ll have to &lt;a href="http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;buy the book&lt;/a&gt;. &lt;img src='http://cafe.elharo.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /&gt; &lt;/i&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/ODPnn8k6P2RjRbrQ7g-3GtltmM8/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/ODPnn8k6P2RjRbrQ7g-3GtltmM8/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/NlCxXDEQg3k" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/web/refactoring-html/xslt/#comments" thr:count="3" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/web/refactoring-html/xslt/feed/atom/" thr:count="3" />
		<thr:total>3</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/web/refactoring-html/xslt/</feedburner:origLink></entry>
		<entry>
		<author>
			<name>Elliotte Rusty Harold</name>
						<uri>http://www.elharo.com/</uri>
					</author>
		<title type="html">TagSoup</title>
		<link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/TheCafes/~3/af40bu2Fmuw/" />
		<id>http://cafe.elharo.com/uncategorized/tagsoup/</id>
		<updated>2008-06-27T13:19:17Z</updated>
		<published>2008-06-27T13:19:17Z</published>
		<category scheme="http://cafe.elharo.com" term="Refactoring HTML" /><category scheme="http://cafe.elharo.com" term="Uncategorized" />		<summary type="html"><![CDATA[Here&#8217;s part 13 of the ongoing serialization of Refactoring HTML, also available from Amazon  and Safari.
John Cowan’s TagSoup (http://home.ccil.org/~cowan/XML/tagsoup/) is an open source HTML parser written in Java that implements the Simple API for XML, or SAX. Cowan describes TagSoup as “a SAX-compliant parser written in Java that, instead of parsing well-formed or valid [...]]]></summary>
		<content type="html" xml:base="http://cafe.elharo.com/uncategorized/tagsoup/">&lt;p&gt;&lt;i&gt;Here&amp;#8217;s part 13 of the ongoing serialization of &lt;cite&gt;Refactoring HTML&lt;/cite&gt;, also available from &lt;a href="http://www.amazon.com/exec/obidos/ISBN=0321503635/ref=nosim/cafeaulaitA" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.amazon.com');"&gt;Amazon&lt;/a&gt;  and &lt;a href="http://safari.oreilly.com/9780321552044" onclick="javascript:pageTracker._trackPageview('/outbound/article/safari.oreilly.com');"&gt;Safari&lt;/a&gt;.&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;John Cowan’s TagSoup (&lt;a href="http://home.ccil.org/~cowan/XML/tagsoup/" onclick="javascript:pageTracker._trackPageview('/outbound/article/home.ccil.org');"&gt;http://home.ccil.org/~cowan/XML/tagsoup/&lt;/a&gt;) is an open source HTML parser written in Java that implements the Simple API for XML, or SAX. Cowan describes TagSoup as “a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML.”&lt;br /&gt;
&lt;span id="more-286"&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;
TagSoup is not intended as an end-user tool, but it does have a basic command-line interface. It’s also straightforward to hook it up to any number of XML tools that accept input from SAX. Once you’ve done that, feed in HTML, and out will come well-formed XHTML. For example:
&lt;/p&gt;
&lt;pre&gt;$ java -jar tagsoup.jar index.html
&amp;lt;?xml version="1.0" standalone="yes"?&gt;
&amp;lt;html lang="en-US" xmlns="http://www.w3.org/1999/xhtml"&gt;&amp;lt;head&gt;&amp;lt;title&gt;Java Virtual Machines&amp;lt;/title&gt;&amp;lt;meta name="description" content="A Growing
list of Java virtual machines and their capabilities"&gt;
&amp;lt;/meta&gt;&amp;lt;/head&gt;&amp;lt;body bgcolor="#ffffff" text="#000000"&gt;

&amp;lt;h1 align="center"&gt;Java Virtual Machines&amp;lt;/h1&gt;
…
&lt;/pre&gt;
&lt;p&gt;You can improve its output a little bit by adding the &lt;code&gt;--omit-xml-declaration&lt;/code&gt; and &lt;code&gt;--nodefaults&lt;/code&gt; command-line options:&lt;/p&gt;
&lt;pre&gt;$ java -jar tagsoup.jar --omit-xml-declaration --nodefaults index.html
&amp;lt;html lang="en-US" xmlns="http://www.w3.org/1999/xhtml"&gt;&amp;lt;head&gt;&amp;lt;title&gt;Java Virtual Machines&amp;lt;/title&gt;&amp;lt;meta name="description" content="A Growing
list of Java virtual machines and their capabilities"&gt;&amp;lt;/meta&gt;
&amp;lt;/head&gt;&amp;lt;body bgcolor="#ffffff" text="#000000"&gt;

&amp;lt;h1 align="center"&gt;Java Virtual Machines&amp;lt;/h1&gt;
…
&lt;/pre&gt;
&lt;p&gt;This will remove a few pieces that are likely to confuse one browser or another.&lt;/p&gt;
&lt;p&gt;You can use the &lt;code&gt;--encoding&lt;/code&gt; option to specify the character encoding of the input document. For example, if you know the document is written in Latin-1, ISO 8859-1, you could run it like so:&lt;/p&gt;
&lt;p&gt;&lt;samp&gt;$ java -jar tagsoup.jar &amp;#8211;encoding=ISO-8859-1 index.html&lt;br /&gt;
&lt;/samp&gt;&lt;/p&gt;
&lt;p&gt;TagSoup’s output is always UTF-8.&lt;/p&gt;
&lt;p&gt;Finally, you can use the &lt;code&gt;--files&lt;/code&gt; option to write new copies of the input files with the extension .xhtml. Otherwise, TagSoup prints the output on stdout, from where you can redirect it to any convenient location. TagSoup cannot change a file in place, like Tidy can.&lt;/p&gt;
&lt;p&gt;However, TagSoup is primarily designed for use as a library. Its output from command-line mode leaves something to be desired compared to Tidy. In particular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It does not convert presentational markup to CSS.&lt;/li&gt;
&lt;li&gt; It does not include a DOCTYPE declaration, which is needed before some browsers will recognize XHTML.&lt;/li&gt;
&lt;li&gt; It does include an XML declaration, which needlessly confuses older browsers.&lt;/li&gt;
&lt;li&gt; It uses start-tag and end-tag pairs for empty elements such as br and hr, which may confuse some older browsers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;TagSoup does not guarantee absolutely valid XHTML (though it does guarantee well-formedness). There are a few things it cannot handle. Most important, XHTML requires all img elements to have an alt attribute. If the alt attribute is empty, the image is purely presentational and should be ignored by screen readers. If the attribute is not empty, it is used in place of the image by screen readers. TagSoup has no way of knowing whether any given img with an omitted alt attribute is presentational or not, so it does not insert any such attributes. Similarly, TagSoup does not add summaries to tables. You’ll have to do that by hand, and you’ll want to validate after using TagSoup to make sure you catch all these instances.&lt;/p&gt;
&lt;p&gt;However, despite these limits, TagSoup does do a huge amount of work for you at very little cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://feedads.googleadservices.com/~a/Y2jT1JOSHaRxLGmGvCaq8C9SbFQ/a"&gt;&lt;img src="http://feedads.googleadservices.com/~a/Y2jT1JOSHaRxLGmGvCaq8C9SbFQ/i" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feedproxy.google.com/~r/TheCafes/~4/af40bu2Fmuw" height="1" width="1"/&gt;</content>
		<link rel="replies" type="text/html" href="http://cafe.elharo.com/uncategorized/tagsoup/#comments" thr:count="4" />
		<link rel="replies" type="application/atom+xml" href="http://cafe.elharo.com/uncategorized/tagsoup/feed/atom/" thr:count="4" />
		<thr:total>4</thr:total>
	<feedburner:origLink>http://cafe.elharo.com/uncategorized/tagsoup/</feedburner:origLink></entry>
	</feed>
