<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Cafes &#187; Programming</title>
	<atom:link href="http://cafe.elharo.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://cafe.elharo.com</link>
	<description>Longer than a blog; shorter than a book</description>
	<lastBuildDate>Sat, 30 Mar 2013 11:51:03 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Why java.util.Arrays uses Two Sorting Algorithms</title>
		<link>http://cafe.elharo.com/programming/java-programming/why-java-util-arrays-uses-two-sorting-algorithms/</link>
		<comments>http://cafe.elharo.com/programming/java-programming/why-java-util-arrays-uses-two-sorting-algorithms/#comments</comments>
		<pubDate>Sat, 30 Mar 2013 11:51:03 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[mergesort]]></category>
		<category><![CDATA[quicksort]]></category>
		<category><![CDATA[sorting]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=1045</guid>
		<description><![CDATA[java.util.Arrays uses quicksort (actually dual pivot quicksort in the most recent version) for primitive types such as int and mergesort for objects that implement Comparable or use a Comparator. Why the difference? Why not pick one and use it for all cases? Robert Sedgewick suggests that &#8220;the designer&#8217;s assessment of the idea that if a [...]]]></description>
				<content:encoded><![CDATA[<p><code>java.util.Arrays</code> uses quicksort (actually dual pivot quicksort in the most recent version) for primitive types such as <code>int</code> and mergesort for objects that implement <code>Comparable</code> or use a <code>Comparator</code>. Why the difference? Why not pick one and use it for all cases? <a href="https://class.coursera.org/algs4partI-002/lecture/38">Robert Sedgewick suggests</a> that &#8220;the designer&#8217;s assessment of the idea that if a programmer&#8217;s using objects maybe space is not a critically important consideration and so the extra space used by mergesort maybe&#8217;s not a problem and if the programmer&#8217;s using primitive types  maybe performance is the most important thing so we use the quicksort&#8221;, but I think there&#8217;s a much more obvious reason.<br />
<span id="more-1045"></span></p>
<p>Quicksort is faster in both cases. Mergesort is stable in both cases. But for primitive types quicksort is stable too! That&#8217;s because primitive types in Java are like <a href="https://en.wikipedia.org/wiki/Identical_particles">elementary particles in quantum mechanics</a>. You can&#8217;t tell the difference between  one 7 and another 7. Their value is all that defines them. Sort the array such [7, 6, 6, 7, 6, 5, 4, 6, 0] into [0, 4, 5, 6, 6, 6, 6, 7, 7]. Not only do you not care which 6 ended up in which position. It&#8217;s a meaningless question. The array positions don&#8217;t hold pointers to the objects. They hold the actual values of the objects. We might as well say that all the original values were thrown away and replaced with new ones. Or not. It just doesn&#8217;t matter at all. There is no possible way you can tell the difference between the output of a stable and unstable sorting algorithm when all that&#8217;s sorted are primitive types. Stability is irrelevant with primitive types in Java.</p>
<p>By contrast when sorting objects, including sorting objects by a key of primitive type, you&#8217;re sorting pointers. The objects themselves do have an independent nature separate from their key values. Sometimes this may not matter all that much&#8211;e.g. if you&#8217;re sorting <code>java.lang.Strings</code>&#8211;but sometimes it matters a great deal. To borrow an example from Sedgewick&#8217;s Algorithms I class, suppose you&#8217;re sorting student records by section:</p>
<pre><code>public class Student {

  String lastname;
  String firstName;
  int section; 
  
}</code></pre>
<p>Suppose you start with a list sorted by last name and then first name:</p>
<table>
<tr>
<td>John</td>
<td>Alisson</td>
<td>2</td>
</tr>
<tr>
<td>Nabeel</td>
<td>Aronowitz</td>
<td>3</td>
</tr>
<tr>
<td>Joe</td>
<td>Jones</td>
<td>2</td>
</tr>
<tr>
<td>James</td>
<td>Ledbetter</td>
<td>2</td>
</tr>
<tr>
<td>Ilya</td>
<td>Lessing</td>
<td>1</td>
</tr>
<tr>
<td>Betty</td>
<td>Lipschitz</td>
<td>2</td>
</tr>
<tr>
<td>Betty</td>
<td>Neubacher</td>
<td>2</td>
</tr>
<tr>
<td>John</td>
<td>Neubacher</td>
<td>3</td>
</tr>
<tr>
<td>Katie</td>
<td>Senya</td>
<td>1</td>
</tr>
<tr>
<td>Jim</td>
<td>Smith</td>
<td>3</td>
</tr>
<tr>
<td>Ping</td>
<td>Yi</td>
<td>1</td>
</tr>
</table>
<p>When you sort this again by section, if the sort is stable then it will still be sorted by last name and first name within each section:</p>
<table>
<tr>
<td>Ilya</td>
<td>Lessing</td>
<td>1</td>
</tr>
<tr>
<td>Katie</td>
<td>Senya</td>
<td>1</td>
</tr>
<tr>
<td>Ping</td>
<td>Yi</td>
<td>1</td>
</tr>
<tr>
<td>John</td>
<td>Alisson</td>
<td>2</td>
</tr>
<tr>
<td>Joe</td>
<td>Jones</td>
<td>2</td>
</tr>
<tr>
<td>James</td>
<td>Ledbetter</td>
<td>2</td>
</tr>
<tr>
<td>Betty</td>
<td>Lipschitz</td>
<td>2</td>
</tr>
<tr>
<td>Betty</td>
<td>Neubacher</td>
<td>2</td>
</tr>
<tr>
<td>Nabeel</td>
<td>Aronowitz</td>
<td>3</td>
</tr>
<tr>
<td>John</td>
<td>Neubacher</td>
<td>3</td>
</tr>
<tr>
<td>Jim</td>
<td>Smith</td>
<td>3</td>
</tr>
</table>
<p>However if you use quicksort, you&#8217;ll end up with something like this and have to resort each section by name to maintain the sorting by name:</p>
<table>
<tr>
<td>Ilya</td>
<td>Lessing</td>
<td>1</td>
</tr>
<tr>
<td>Katie</td>
<td>Senya</td>
<td>1</td>
</tr>
<tr>
<td>Ping</td>
<td>Yi</td>
<td>1</td>
</tr>
<tr>
<td>Betty</td>
<td>Lipschitz</td>
<td>2</td>
</tr>
<tr>
<td>Betty</td>
<td>Neubacher</td>
<td>2</td>
</tr>
<tr>
<td>John</td>
<td>Alisson</td>
<td>2</td>
</tr>
<tr>
<td>Joe</td>
<td>Jones</td>
<td>2</td>
</tr>
<tr>
<td>James</td>
<td>Ledbetter</td>
<td>2</td>
</tr>
<tr>
<td>Jim</td>
<td>Smith</td>
<td>3</td>
</tr>
<tr>
<td>John</td>
<td>Neubacher</td>
<td>3</td>
</tr>
<tr>
<td>Nabeel</td>
<td>Aronowitz</td>
<td>3</td>
</tr>
</table>
<p>That&#8217;s why stable sorts make sense for object types, especially mutable object types and object types with more data than just the sort key, and mergesort is such a sort. But for primitive types stability is not only irrelevant. It&#8217;s meaningless. </p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/java-programming/why-java-util-arrays-uses-two-sorting-algorithms/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why Functional Programming in Java is Dangerous</title>
		<link>http://cafe.elharo.com/programming/java-programming/why-functional-programming-in-java-is-dangerous/</link>
		<comments>http://cafe.elharo.com/programming/java-programming/why-functional-programming-in-java-is-dangerous/#comments</comments>
		<pubDate>Sun, 20 Jan 2013 11:47:14 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=1031</guid>
		<description><![CDATA[In my day job I work with a lot of very smart developers who graduated from top university CS programs such as MIT, CMU, and Chicago. They cut their teeth on languages like Haskell, Scheme, and Lisp. They find functional programming to be a natural, intuitive, beautiful, and efficient style of programming. They&#8217;re only wrong [...]]]></description>
				<content:encoded><![CDATA[<p>In my day job I work with a lot of very smart developers who graduated from top university CS programs such as MIT, CMU, and Chicago. They cut their teeth on languages like Haskell, Scheme, and Lisp. They find functional programming to be a natural, intuitive, beautiful, and efficient style of programming. They&#8217;re only wrong about one of those.<br />
<span id="more-1031"></span></p>
<p>The problem is that my colleagues and I are not writing code in Haskell, Scheme, Lisp, Clojure, Scala, or even Ruby or Python. We are writing code in Java, and in Java functional programming is dangerously inefficient. Every few months I find myself debugging a production problem that ultimately traces back to a misuse of functional ideas and algorithms in a language and more importantly a virtual machine that just wasn&#8217;t built for this style of programming. </p>
<p>Recently Bob Martin came up with a  <a href="http://pragprog.com/magazines/2013-01/functional-programming-basics">really good example</a> that shows why. Here&#8217;s a bit of Clojure (a real functional language) that returns a list of the first 25 integers:</p>
<p><code style="white-space: pre-wrap;">(take 25 (squares-of (integers)))</code></p>
<p>This code runs, and it runs reasonably quickly. The output is:</p>
<p>(1 4 9 16 25 36 49 64 &#8230; 576 625)</p>
<p>Now suppose we want to reproduce this in Java. If we write Java the way Gosling et al intended Java to be written, then the code is simple, fast, and obvious:</p>
<pre><code>for (int i=1; i<=25; i++)
    System.out.println(i*i);
}</code></pre>
<p>But now suppose we do it functionally! In particular suppose we naively reproduce the Clojure style above: </p>
<pre></code>import java.util.ArrayList;
import java.util.List;

public class Take25 {

    public static void main(String[] args) {    
        for (Object o : take(25, squaresOf(integers()))) {
            System.out.println(o);
        }
    }
    
    public static List&lt;?> take(int n, List&lt;?> list) {
        return list.subList(0, n);
    }
    
    public static List&lt;Integer> squaresOf(List&lt;Integer> list) {
        List&lt;Integer> result = new ArrayList&lt;Integer>();
        for (Integer number : list) {
            result.add(number.intValue() * number.intValue());
        }
        return result;
    }
    
    public static List&lt;Integer> integers() {
        List&lt;Integer> result = new ArrayList&lt;Integer>();
        for (int i = 1; i &lt;= Integer.MAX_VALUE; i++) {
            result.add(i);
        }
        return result;
    }
    
}
</code></pre>
<p>Try to run that. Go ahead. I dare you....OK, recovered from the heap dump yet?</p>
<pre><samp>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2760)
	at java.util.Arrays.copyOf(Arrays.java:2734)
	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
	at java.util.ArrayList.add(ArrayList.java:351)
	at Take25.integers(Take25.java:30)
	at Take25.main(Take25.java:9)
</samp></pre>
<p>How did Clojure handle a function that returns every single int, while Java crapped out? The answer is that Clojure, like pretty much all true functional languages (and unlike Java) does lazy evaluation. It doesn't compute values it doesn't use. And it can get away with this because Clojure, unlike Java, is really and truly functional. It can assume that variables aren't mutated, that the order of evaluation doesn't matter, and thus that it can perform optimizations that a Java compiler can't. And this is why functional programming in Java is dangerous. Because Java isn't a true functional language, the JIT and javac can't optimize functional constructs as aggressively and efficiently as they can in a real functional language. Standard functional operations like returning infinite lists are death for a Java program.  That's why functional programming in Java is dangerous.</p>
<p>You may object that I've set up a straw man here. OK, you can't return a list of all the integers (or even all the ints) in Java; but surely no one would really do that. Let's look at a more realistic approach. Here again I use recursion to compute the squares rather than a loop:</p>
<pre><code>public class Squares {
    
    public static void main(String args[]) {
        squareAndPrint(1, Integer.parseInt(args[0]));
    }
    
    public static void squareAndPrint(int n, int max) {
        System.out.println(n * n);
        if (max > n) {
            squareAndPrint(n + 1, max);
        }
    }
    
}</code></pre>
<p>That will run. But now suppose I don't want the first 25 squares but the first 25,000:</p>
<pre><samp></samp></pre>
<p>Ooops. Stack overflow. This is why in <a href="http://xom.nu/">XOM</a> I was very careful to use loops rather than recursion, even in places where recursion was much clearer. Otherwise a carefully configured XML document could have caused a XOM-using program to dump core. Avoiding arbitrarily large recursion in non-functional languages like Java and C isn't just a performance requirement, it's a security requirement too!</p>
<p>Now before the flames begin, let me be clear about what I am not saying. I am not saying that functional programming is a bad idea. I am not saying that functional programming is inefficient. I actually love functional programming. Like my colleagues I find find functional programming to be a natural, intuitive, and beautiful style of programming but only when it's done in a language that was designed for it from the beginning like Haskell. Functional idioms in Java are performance bugs waiting to bite you. </p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/java-programming/why-functional-programming-in-java-is-dangerous/feed/</wfw:commentRss>
		<slash:comments>62</slash:comments>
		</item>
		<item>
		<title>1% Problems</title>
		<link>http://cafe.elharo.com/programming/1-problems/</link>
		<comments>http://cafe.elharo.com/programming/1-problems/#comments</comments>
		<pubDate>Sun, 22 Jul 2012 15:58:29 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=871</guid>
		<description><![CDATA[I hate 1% problems. No this isn&#8217;t an OWS slogan. I&#8217;m thinking of those code issues that really aren&#8217;t a problem 99% of the time, but when they bite, they&#8217;re really hard to debug and they cause real pain. Several common cases in Java: Using java.util.Date or java.util.Calendar instead of JodaTime. Not specifying a Locale [...]]]></description>
				<content:encoded><![CDATA[<p>I hate 1% problems. No this isn&#8217;t an OWS slogan. I&#8217;m thinking of those code issues that really aren&#8217;t a problem 99% of the time, but when they bite, they&#8217;re really hard to debug and they cause real pain. Several common cases in Java:</p>
<ol>
<li> Using <code>java.util.Date</code> or <code>java.util.Calendar</code> instead of JodaTime.</li>
<li> Not specifying a <code>Locale</code> when doing language sensitive operations such as <code>toLowerCase()</code> and <code>toUpperCase()</code>.</li>
<li> Not escaping strings passed to SQL, XML, HTML or other external formats.</li>
</ol>
<p>What I hate most is that it&#8217;s really, really hard to convince other developers that these are problems they should take seriously.<span id="more-871"></span> The excuses are common:</p>
<p>&#8220;No, I don&#8217;t have to specify a locale here because the strings are ASCII.&#8221;</p>
<p>&#8220;I&#8217;m only getting a timestamp; I don&#8217;t need a proper timezone.&#8221;</p>
<p>&#8220;The data we&#8217;re encoding is coming from a web service we control, and we know it&#8217;s not going to send us any formfeeds or null characters.&#8221;</p>
<p>&#8220;This string is a constant so we clearly don&#8217;t need to escape it&#8221;, and so on.</p>
<p>All these answers reduce to, &#8220;yes, there&#8217;s sort of a theoretical problem here; and maybe  <a href="http://findbugs.sourceforge.net/">FindBugs</a> is complaining; but it doesn&#8217;t really matter in this case, and I&#8217;ve got more important things to spend my time on.&#8221;</p>
<p>And you know what? The nay sayers are right, 99% of the time. The problem is that every one of these issues can bite badly that 1% of the time, and it&#8217;s usually not obvious when you&#8217;re in a 1% case. For instance, even because the string being used to construct an HTML attribute value today is a literal, doesn&#8217;t mean it won&#8217;t be refactored into a variable next year, and then a variable built from user input a year later. Suddenly there&#8217;s an <a href="http://en.wikipedia.org/wiki/Cross-site_request_forgery">XSRF</a> vulnerability in your code that two years ago everyone agreed clearly couldn&#8217;t happen, and thus no effort was put into preventing it.</p>
<p>Worse yet, although these problems are very easy to spot at the source code level&#8211;indeed can often be detected algorithmically by tools such as <a href="http://pmd.sourceforge.net/pmd-5.0.0/">PMD</a> or <a href="http://findbugs.sourceforge.net/">FindBugs</a>&#8211;it&#8217;s usually not obvious what the cause of the problem is once it does manifest itself. For instance, out of all the myriad reasons a SOAP call might be consistently failing, is the possibility that the data contains an invisible form feed character the first thing that comes to mind? </p>
<p>I have seen major production problems caused by every one of these (#2 just this past week, and #3 the week before) and every one many times more than once. In the case of the failure to properly escape web service input before generating XML, the bug had lived in the code for years before an errant form feed showed up in the data stream and cost several engineer days trying to understand and fix the problem. </p>
<p>These aren&#8217;t hard or costly problems to prevent or fix, if we just develop good coding habits. Anytime you see a SQL statement built by string concatenation, alarm bells ought to be sounding. Anytime you see <code>getBytes()</code> invoked on a string without specifying a character set, you shouldn&#8217;t have to think twice about changing it to <code>getBytes(Charsets.UTF-8)</code>. Anytime you see <code>java.util.Date</code> or <code>java.util.Calendar</code> in code, you should know that something is likely to go wrong, and probably at the worst possible time. </p>
<p>It&#8217;s like seeing a <a href="http://www.yelp.com/biz_photos/94wE8kePeN94qj550R16eQ?select=IcFNCUL8Bn83biZlDBkFjw#IcFNCUL8Bn83biZlDBkFjw">large stack of heavy boxes piled in front of an emergency exit</a>. You don&#8217;t have to think about it, estimate the risk of fixing it compared to the risk of leaving it as is, file bug reports, or prioritize it compared to everything else you have to do. You just fix it as quickly as you can. These are dangerous situations; they&#8217;re easy to spot; and as professionals we have a duty to fix them when we find them and not to cause them in the first place.</p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/1-problems/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t Design for Reuse</title>
		<link>http://cafe.elharo.com/programming/dont-design-for-reuse/</link>
		<comments>http://cafe.elharo.com/programming/dont-design-for-reuse/#comments</comments>
		<pubDate>Sat, 14 Jul 2012 21:17:06 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=862</guid>
		<description><![CDATA[Last week one of my colleagues hit me with an idea that was so obvious when he pointed it out I wondered why I hadn&#8217;t realized it before: If you&#8217;re designing for reuse, you&#8217;re doing it wrong. In 2012 the only code you should be writing is what&#8217;s needed for the immediate task at hand. [...]]]></description>
				<content:encoded><![CDATA[<p>Last week one of my colleagues hit me with an idea that was so obvious when he pointed it out I wondered why I hadn&#8217;t realized it before: </p>
<p>If you&#8217;re designing for reuse, you&#8217;re doing it wrong. </p>
<p>In 2012 the only code you should be writing is what&#8217;s needed for the immediate task at hand. Don&#8217;t design for reuse. Don&#8217;t consider reuse. Don&#8217;t waste one minute of your day making code reusable.<br />
<span id="more-862"></span></p>
<p>The fact is any reusable code you need already exists. Want to connect to an HTTP server with full support for authentication and cookies? That sounds like something a lot of projects could use, so you should wrap it up in a nice HTTP class or library, right? Wrong. You should use <a href="http://hc.apache.org/httpcomponents-client-ga/index.html">Apache HttpClient</a> instead. </p>
<p>Do you need to solve some initial value problems with a shooting method? Don&#8217;t crack out your <a href="http://www.amazon.com/exec/obidos/ISBN=0538733519/ref=nosim/cafeaulaitA/">numerical analysis textbook</a> and start coding. Just download <a href="http://www.ee.ucl.ac.uk/~mflanaga/java/RungeKutta.html">Flanagan&#8217;s Java Scientific Library</a> or buy a <a href="http://www.nag.co.uk/doc/techrep/html/Tr2_09/Tr2_09.asp">NAG license</a> instead.  Need a date chooser widget and want to share it with your colleagues? Just tell them about <a href="http://www.toedter.com/en/jcalendar/index.html">JCalendar</a> instead. Maybe that doesn&#8217;t have exactly the look and feel you were aiming for? Fair enough. Write your own component or fork the existing one, but realize that your very specific look isn&#8217;t likely to fit other people&#8217;s apps any better than JCalendar does, so don&#8217;t waste any time making yours reusable. </p>
<p>These examples are for Java, but the same is true in any major language including Perl, Python, Ruby, C++, C#, and Scala. In fact, if the language doesn&#8217;t have a library that solves the reusable parts of your problem, you shouldn&#8217;t be using that language for that problem. </p>
<p>Are there exceptions to this rule? I can think of two (and so far this feels like an exhaustive list). </p>
<p>The first exception is when you&#8217;re writing code for something so new that libraries don&#8217;t exists yet, and you&#8217;re the first one out of the gate, then make the code reusable. For instance, when I first wrote my <a href="http://xincluder.sourceforge.net/">XIncluder libraries</a> the XInclude spec was still in development, and there really weren&#8217;t any alternatives in Java. These libraries became part of the proof of implementability that allowed the spec to advance to full recommendation status some years later. (That effort very nearly got me condemned to invited expert status on the working group, though fortunately saner heads prevailed.) Writing my own XInclude library made sense ten years ago, but I certainly wouldn&#8217;t repeat it today.</p>
<p>The second exception is for experts only, and I&#8217;m not even sure about this one. If you really are an expert in the field that the reusable code addresses, and if you have made  a careful survey of the existing options and concluded that they are inadequate and you see how to do a better job, then, and only then, might you consider writing your own reusable code. This is what I did with <a href="http://www.xom.nu/">XOM</a>. Only after I had written a <a href="http://www.cafeconleche.org/books/xmljava/">several hundred page book</a> exhaustively documenting all the then current APIs for processing XML with Java and their stengths and weaknesses, did I sit down to design an API that improved on them. And although I do think I came up with the best such API yet designed, I&#8217;m still not sure that was the best use of my time. XOM is, IMHO, superior to what came before it; but it hasn&#8217;t been superior enough to really replace those other libraries in many applications. The need just wasn&#8217;t that great. As time passes, the code already available for reuse approaches &#8220;good enough&#8221; and the cost/benefit ratio of improving on it goes way up. </p>
<p>Are there other exceptions? Other times you really should write reusable code? I can&#8217;t think of any. Too many developers have spent too much time exploring the problem space, and made their work available for free at sites like Sourceforge and Github. While there will always be new problems to be solved, there&#8217;s just not a lot of benefit to be gained by solving the old problems one more time. The next time you find yourself designing for reuse, stop and ask yourself whether you should be reusing someone else&#8217;s code instead. </p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/dont-design-for-reuse/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>My New Year&#8217;s Resolution</title>
		<link>http://cafe.elharo.com/programming/my-new-years-resolution/</link>
		<comments>http://cafe.elharo.com/programming/my-new-years-resolution/#comments</comments>
		<pubDate>Sat, 01 Jan 2011 14:41:36 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=676</guid>
		<description><![CDATA[In 2011 my New Year&#8217;s resolution is to do more things the easy way, even if it takes longer the first time. I am going to stop using brute force to solve problems. In particular: I am finally going to memorize how one redirects both stderr and stdout to the same stream. (2&#62;&#38;1 &#124;) I [...]]]></description>
				<content:encoded><![CDATA[<p>In 2011 my New Year&#8217;s resolution is to do more things the easy way, even if it takes longer the first time. I am going to stop using brute force to solve problems. In particular:</p>
<ul>
<li>I am finally going to memorize how one redirects both stderr and stdout to the same stream. (<code><a href="http://stackoverflow.com/questions/818255/in-the-bash-shell-what-is-21">2&gt;&amp;1</a> |</code>)</li>
<li>I am going to learn the sed? trick my advisor showed me 20 years ago for repeating a command from the shell history while substituting one word for another, instead of just using the arrow key to backup to and erase the string. (<code>^string1^string2^</code> or <code>!!:s/string1/string2/</code> or for global substitution, not just the first occurrence <code>!!:gs/string1/string2/</code>) </li>
<li>I am going to increase my regex fu and use regular expressions consistently instead of just editing 20 lines of copy and paste code. (This would be easier if every editor didn&#8217;t have subtly different syntax.)</li>
<li>I am going to use Python to automate repetitive tasks.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/my-new-years-resolution/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Could not load a dependent class com/jcraft/jsch/Logger</title>
		<link>http://cafe.elharo.com/programming/java-programming/could-not-load-a-dependent-class-comjcraftjschlogger/</link>
		<comments>http://cafe.elharo.com/programming/java-programming/could-not-load-a-dependent-class-comjcraftjschlogger/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 10:40:59 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[ant]]></category>
		<category><![CDATA[jsch]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=639</guid>
		<description><![CDATA[Have you ever seen an Ant error message like this? BUILD FAILED /Users/elharo/Projects/XOM/build.xml:545: Problem: failed to create task or type scp Cause: Could not load a dependent class com/jcraft/jsch/Logger It is not enough to have Ant's optional JARs you need the JAR files that the optional tasks depend upon. Ant's optional task dependencies are listed [...]]]></description>
				<content:encoded><![CDATA[<p>Have you ever seen an Ant error message like this? </p>
<pre>BUILD FAILED
/Users/elharo/Projects/XOM/build.xml:545: Problem: failed to create task or type scp
Cause: Could not load a dependent class com/jcraft/jsch/Logger
       It is not enough to have Ant's optional JARs
       you need the JAR files that the optional tasks depend upon.
       Ant's optional task dependencies are listed in the manual.
Action: Determine what extra JAR files are needed, and place them in one of:
        -/opt/ant/lib
        -/Users/elharo/.ant/lib
        -a directory added on the command line with the -lib argument

Do not panic, this is a common problem.
The commonest cause is a missing JAR.

This is not a bug; it is a configuration problem</pre>
<p>As usual, the ant error message is completely unhelpful, though for once it&#8217;s at least technically correct. (Most of the time when ant says, &#8220;This is not a bug; it is a configuration problem&#8221;, it is in fact a bug and not a configuration problem.) Here&#8217;s what&#8217;s really happening.<br />
<span id="more-639"></span></p>
<p>The jsch jar file distributed from http://www.jcraft.com/jsch/ is corrupt. Either they uploaded it wrong or they misconfigured their web server or both. (Update: it looks like a misconfigured server/poorly designed web page. The so-called download link isn&#8217;t that at all. You can follow it but not save it.) The jar file with the relevant classes is there, but it&#8217;s no good. You can check your local copy by trying to list its contents with<br />
<samp>jar tvf</samp>
<p>:</p>
<pre>$ jar tvf /usr/share/ant/lib/jsch*jar
java.util.zip.ZipException: error in opening zip file
	at java.util.zip.ZipFile.open(Native Method)
	at java.util.zip.ZipFile.<init>(ZipFile.java:114)
	at java.util.zip.ZipFile.<init>(ZipFile.java:75)
	at sun.tools.jar.Main.list(Main.java:979)
	at sun.tools.jar.Main.run(Main.java:224)
	at sun.tools.jar.Main.main(Main.java:1149)</pre>
<p>If you see anything but a list of classes, the problem is that the JSCH jar is no good. To fix it, use <a href="http://downloads.sourceforge.net/project/jsch/jsch.jar/0.1.42/jsch-0.1.42.jar?use_mirror=softlayer&amp;ts=1278672723">this link to sourceforge</a> instead.</p>
<p>Once again I am reminded of the perils of depending on external libraries, especially ones you don&#8217;t build or distribute with your own product. In 2010 ssh and scp are mandatory features of any build and deployment system. Secure communications are too important to be left to random third party web sites. </p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/java-programming/could-not-load-a-dependent-class-comjcraftjschlogger/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Dn&#8217;t Abbrvt</title>
		<link>http://cafe.elharo.com/programming/dnt-abbrvt/</link>
		<comments>http://cafe.elharo.com/programming/dnt-abbrvt/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 10:32:43 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=515</guid>
		<description><![CDATA[Is req a request or a requisition? Is res a response, a reservation, a resume, or a result? Is def a default or a definition? Is rng a range or a random number generator? Is v1 version 1 or value 1? Is e an event, an entity, or an exception? Is f a file or [...]]]></description>
				<content:encoded><![CDATA[<p>Is <code>req</code> a request or a requisition?</p>
<p>Is <code>res</code> a response, a reservation, a resume, or a result?</p>
<p>Is <code>def</code> a default or a definition?</p>
<p>Is <code>rng</code> a range or a random number generator?</p>
<p>Is <code>v1</code> version 1 or value 1?</p>
<p>Is <code>e</code> an event, an entity, or an exception? </p>
<p>Is <code>f</code> a file or a float? </p>
<p>Is <code>lst</code> a list or the least value? </p>
<p>Is <code>temp</code> a  temporary variable or a temperature reading?</p>
<p>Is <code>rep</code> a  representation, a representative, a repetition, or a reputation?</p>
<p>Is <code>tm</code> a time or a trademark? Or even another temporary variable? And if it is a time, is it a timestamp, a time of day, or a duration? (These are three very different things.)</p>
<p>Is <code>admin</code> an administrator, an administrative assistant, or a system administrator?</p>
<p>In context, you can usually figure these things out, but you have to think about them. That&#8217;s inefficient. Far better to just spell out what you mean from the get go.<br />
<span id="more-515"></span></p>
<p>There are a few abbreviations that are so well known and understood that they&#8217;re acceptable:</p>
<ul>
<li><code>max</code> for maximum</li>
<li><code>min</code> for minimum</li>
<li><code>in</code> for <code>InputStream</code></li>
<li><code>out</code> for <code>OutputStream</code></li>
<li><code>e</code> or <code>ex</code> for an exception in a <code>catch</code> clause (but nowhere else).</li>
<li><code>num</code> for number, though only when used as prefix as in <code>numTokens</code>, or <code>numHits</code>.</li>
</ul>
<p>You can use single letter variable names for the occasional quantity that has no meaning other than its type. For example, a string variable can be named <code>s</code>, an int variable <code>i</code>, or a double variable <code>x</code>. However, this should only be used when the program really doesn&#8217;t know anything about the nature of the variable other than its type. For example, a method that calculates the cube root of a double may name its argument <code>x</code>; but a method that converts temperature from degrees Fahrenheit to degrees Celsius should name its argument <code>degrees</code>, <code>degreesFahrenheit</code>, or perhaps <code>temperatureFahrenheit</code>.</p>
<p>In addition, there&#8217;s nothing at all wrong with using common acronyms that are more recognized than what they stand for: URL, HTML, XML, XSL, etc. However, these are the exceptions, not the rules; and I would still be careful with common abbreviations that mean something else at first glance. EmployeeBO is almost certainly a Business Object, but that wasn&#8217;t what you read it as first, was it? </p>
<p>Code should be optimized for reading and comprehension, not for marginally faster typing. You should no more abbreviate names in your code than you do words in your sentences. No1 wnts 2 rd txt wrttn lk ths. <img src='http://cafe.elharo.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/dnt-abbrvt/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Bruce Eckel is Wrong</title>
		<link>http://cafe.elharo.com/programming/bruce-eckel-is-wrong/</link>
		<comments>http://cafe.elharo.com/programming/bruce-eckel-is-wrong/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 10:59:14 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Bruce Eckel]]></category>
		<category><![CDATA[exceptions]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=517</guid>
		<description><![CDATA[Every time the subject of checked versus runtime exceptions comes up, someone cites Bruce Eckel as an argument by authority. This is unfortunate, because, as much as I like and respect Bruce, he is out to sea on this one. Nor is it merely a matter of opinion. In this case, Bruce is factually incorrect. [...]]]></description>
				<content:encoded><![CDATA[<p>Every time the subject of checked versus runtime exceptions comes up, someone cites Bruce Eckel as an argument by authority. This is unfortunate, because, as much as I like and respect Bruce, he is out to sea on this one. Nor is it merely a matter of opinion. In this case, Bruce is factually incorrect. He believes things about checked exceptions that just aren&#8217;t true; and I think it&#8217;s time to lay his misconceptions to rest once and for all.</p>
<p>Let&#8217;s see exactly what Bruce&#8217;s mistake is. The following is an extended selection from <cite>Thinking in Java, 4th edition, pp. 490-491</cite>:</p>
<blockquote><p>An exception-handling system is a trapdoor that allows your program to abandon execution of the normal sequence of statements. The trapdoors used when an &#8220;exceptional condition&#8221; occurs, such that normal execution is no longer possible or desirable. Exceptions represent conditions that the current method is unable to handle. The reason exception-handling systems were developed is because the approach of dealing with each possible error condition produced by each function call was too onerous, and programmers simply weren&#8217;t doing it. As a result, they were ignoring the errors. It&#8217;s worth observing that the issue of programmer convenience in handling errors was a prime motivation for exceptions in the first place.</p>
<p>One of the important guidelines in exception handling is &#8220;Don&#8217;t catch an exception unless you know what to do with it.&#8221; In fact, one of the important goals of exception handling is to move the error-handling code away from the point where the errors occur. This allows you to focus on what you want to accomplish in one section of your code, and how you&#8217;re going to deal with problems in a distinct separate section of your code. As a result, your mainline code is not cluttered with error-handling logic, and it&#8217;s much easier to understand and maintain. Exception handling also tends to reduce the amount of error-handling code, by allowing one handler to deal with many error sites.</p>
<p>Checked exceptions complicate the scenario a bit, because they force you to add <strong>catch</strong> clauses in places where you may not be ready to handle an error. This results in the &#8220;harmful if swallowed&#8221; problem:</p>
<p><code>try {<br />
 // ... to do something useful<br />
} catch (ObligatoryException e) {} // Gulp!</code>
</p></blockquote>
<p>Do you see the mistake? It&#8217;s a common one.<span id="more-517"></span> Let me repeat it, so it&#8217;s really obvious [emphasis mine]:</p>
<blockquote><p>Checked exceptions complicate the scenario a bit, because <em>they force you to add <strong>catch</strong> clauses in places where you may not be ready to handle an error</em>.</p></blockquote>
<p>This is false. You are never forced to add <code>catch</code> clauses where you are not ready to handle an error. This isn&#8217;t a matter of opinion. It&#8217;s a matter of fact. Checked exceptions do not require <code>catch</code> blocks. They require a <code>catch</code> block OR a <code>throws</code> declaration. Eckel&#8217;s entire argument is based on ignoring the possibility of a <code>throws</code> declaration. </p>
<p>While Bruce is absolutely right that you should not catch an exception unless you know what to do with it, this in no way means that you should insert a <code>catch</code> block everywhere a checked exception may be thrown. If you aren&#8217;t ready to handle an error at one place, let the exception bubble up.   If a checked exception is thrown inside a method where you are not ready to handle it, then the correct response is to add a <code>throws</code> clause to the method indicating that the exception will bubble up from that method. For example,</p>
<pre> public void doSomethingUseful() throws ObligatoryException {
   // ... do something useful that throws an obligatory exception
} </pre>
<p>You do not and should not insert a <code>catch</code> block in a method where you cannot do anything reasonable in the <code>catch</code> block. Checked exceptions never meant that every exception had to be caught as soon as it was thrown. It is perfectly acceptable to declare that a method throws a checked exception. Indeed, this is exactly how exceptions are meant to be used. It warns whoever calls your method that they need to be ready for this exceptional condition, and they either need to catch it and handle it themselves; or, they themselves need to declare that they throw it so that they warn their callers.</p>
<p>Yes, if it were true that every checked exception needed to be caught immediately, then checked exceptions would be incredibly inconvenient. However, experienced Java programmers don&#8217;t do this. Catching each and every checked exception at the first opportunity  is a sure mark of a novice Java developer.</p>
<p>Occasionally, you&#8217;ll <a href="http://www.ibm.com/developerworks/java/library/j-ce/index.html">override a method inherited from a superclass or implement a method declared in interface that does not declare it throws the checked exception that your method throws and thus can&#8217;t throw the correct exception</a>. In this case alone, it may be acceptable to wrap the exception in either a checked exception that the original declaration declares or in a runtime exception (if the original declaration does not declare any appropriate checked exceptions). For example,</p>
<pre>   @Override
    public void doSomethingUseful() {
        try {
     // ... do something useful that throws an obligatory exception
        } 
        catch (ObligatoryException e) {
            throw new ObligatoryRuntimeException (e);
        }</pre>
<p>However, you still don&#8217;t need to handle the exception before you&#8217;re ready for it.</p>
<p>I will note that this situation is a failure of design. When you&#8217;re forced to do this, one of two things is broken:</p>
<ol>
<li>The superclass/interface was not designed properly for extension. Specifically it did not take into account the exceptions overriders/implementers might reasonably want to throw. The method likely should have been declared final and probably shouldn&#8217;t be extended at all. </li>
<li> The overriding/implementing method is violating the contract of the method it overrides/implements by doing something it really should not be doing.</li>
</ol>
<p>A good example of the latter would be letting an <code>IOException</code> wrapped in a <code>RuntimeException</code> escape from the <code>run()</code> method of <code>java.lang.Thread</code> or <code>java.lang.Runnable</code>. These methods do not have sufficient context to handle such an exception, but something further down in the call chain does. The exception should be handled before it bubbles all the way up (though not necessarily in the same method where it&#8217;s first thrown). </p>
<p>In a method properly designed for extension, an empty <code>throws</code> clause (or a <code>throws</code> clause that does not match the actual exception) indicates that callers of that method cannot handle and do not expect such an exception. An overriding method that throws a new exception is violating the contract by failing to handle it, the same as it would were it to restrict a precondition or loosen a postcondition. (In essence, this is another form of loosening a postcondition.)</p>
<p>Eckel is not the only one to make the mistake of assuming that checked exceptions must be immediately at handled at the first possible opportunity. For example, he  cites Barbara Liskov and Alan Snyder explaining their decision not to include checked exceptions in CLU:</p>
<blockquote><p>requiring that the text of a handler be attached to the invocation that raises the exception would lead to unreadable programs in which expressions were broken up with handlers</p></blockquote>
<p>True. Requiring that the text of a handler be attached to the invocation that raises the exception would lead to unreadable programs. Fortunately neither Java nor checked exceptions require any such thing. When handlers are appropriate they can usually be moved to the end of a method, far away from the the invocation that raises the exception. When handlers are inappropriate, you use a <code>throws</code> clause instead. Immediately following every statement that can throw an exception with a <code>catch</code> block is bad form on a par with <code>goto</code>. </p>
<p>Liskov and Snyder continue:</p>
<p>&#8220;We felt it was unrealistic to require the programmer to provide handlers in situations where no meaningful action can be taken.&#8221;</p>
<p>Also very true, but of course, checked exceptions do not require the programmer to provide handlers in situations where no meaningful action can be taken. When no meaningful action can be taken (out of memory error, stack overflow, class not found, etc.) Java programs throw <code>Error</code>s, not checked exceptions, not even runtime exceptions. Checked exceptions <a href="http://cafe.elharo.com/blogroll/internal-and-external-exceptions/">signal environmental problems that programmers cannot prevent or predict</a>, should test for, and most decidedly can handle. To choose the most extreme example, if a production-worthy database system is writing a file and the disk fills up, it should handle the condition gracefully without corrupting the database. A disk full error is neither unforeseeable nor unmanageable. Most checked exceptions aren&#8217;t even that tricky to respond to. </p>
<p>What checked exceptions actually do require is that any method that can throw a checked exception warn its callers that the exception may be thrown. That&#8217;s all. A checked exception is nothing more and nothing less than part of the return type. Methods may return normally or they may throw exceptions. It makes just as much sense to specify the exceptions that can be thrown by a method as it does to specify that a method returns an <code>int</code> or a <code>String</code>. A method that does not declare the exceptions it throws is incomplete.</p>
<p>A lot of us didn&#8217;t really <em>get</em> checked exceptions when Java was released in the mid-nineties. It was a genuinely new idea that I don&#8217;t think  any programming language before had foreshadowed. Liskov and Snyder wrote the paper quoted here in 1979, and their quotes make sense if you assume they simply didn&#8217;t conceive of having different kinds of exceptions in the language. if you only have one kind of exception then it makes sense for it to be a runtime exception rather than a checked exception. </p>
<p>Personally, I didn&#8217;t really understand how to use exceptions until the first edition of <cite>Effective Java</cite> was published. But it&#8217;s been 15 years. That&#8217;s long enough for the message to get out. In 2010 we know better. Proper error handling requires distinguishing programming errors (runtime exceptions) from environmental problems (checked exceptions). Proper error handling requires correcting programming errors and writing handlers for unpreventable environmental conditions. Proper error handling requires knowing when to catch and knowing when to throw. If you try to work with only one kind of exception, or try to get by with only <code>catch</code> but no <code>throws</code>, then exceptions are going to seem very ugly and inconvenient. But if you use checked <em>and</em> runtime exceptions, and use <code>catch</code> <em>and</em> <code>throws</code>, then your error handling code will be far cleaner, safer, and more maintainable. Error handling without checked exceptions and <code>throws</code> is like arithmetic without <code>*</code> and <code>/</code>. Sure, you can do it, but why would when you when using the features the language offers makes life so much simpler? </p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/bruce-eckel-is-wrong/feed/</wfw:commentRss>
		<slash:comments>58</slash:comments>
		</item>
		<item>
		<title>SourceForge for the 21st Century</title>
		<link>http://cafe.elharo.com/programming/sourceforge-for-the-21st-century/</link>
		<comments>http://cafe.elharo.com/programming/sourceforge-for-the-21st-century/#comments</comments>
		<pubDate>Mon, 21 Dec 2009 11:54:23 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[CVS]]></category>
		<category><![CDATA[source code control repository]]></category>
		<category><![CDATA[subversion]]></category>
		<category><![CDATA[svn]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=551</guid>
		<description><![CDATA[Lately I&#8217;ve been thinking a lot about continuous deployment for reasons I&#8217;m not quite yet at liberty to disclose. This has inspired me to improve the XOM release process, to make it more of a one click process, or, to be more accurate, a one ant target process. I can now release a new version [...]]]></description>
				<content:encoded><![CDATA[<p>Lately I&#8217;ve been thinking a lot about continuous deployment for reasons I&#8217;m not quite yet at liberty to disclose. This has inspired me to improve the <a href="http://www.xom.nu/">XOM</a> release process, to make it more of a one click process, or, to be more accurate, a one ant target process. I can now release a new version simply by typing:</p>
<samp>$ ant -Dpassword = secret -Dwebpassword=other_secret release</samp>
<p>This not only builds the entire project. It tags the release in CVS, uploads the zip and tar.gz files to IBiblio, and uploads the documentation to my web host. It doesn&#8217;t yet file a bug to upload the maven files, but I&#8217;m working on that.</p>
<p>During the process of setting this up, I realized that my organization is a little backwards. In particular, I&#8217;m pushing  all the artifacts from my local system. Instead, I should merely be committing everything to the source code control repository; tagging a release; and then having the further downstream artifacts like the zip and tar.gz files and documentation pulled from source code control onto the Web servers.</p>
<p>There are some commercial products that are organized like this, including <a href="http://www.thoughtworks-studios.com/cruise-release-management">ThoughtWorks&#8217;s Cruise</a>, but none of the major open source hosting sites such as SourceForge and java.net work like this. Certainly, SourceForge and similar sites have been major contributors to the open source revolution. They have enabled hobbyist developers working in their garages to use tools and techniques of software development that were previously limited to corporations. They have it enabled far-flung developers around the world to collaborate with each other far more effectively than they could do by e-mailing each other tar files. They have removed the burden of system administration from many programmers, thus enabling them to devote more time to writing code. Make no mistake. SourceForge et al. are real force for good in the community.</p>
<p>That said, the state of the art in software development has moved forward significantly since these sites were founded. CVS has mostly been replaced by Subversion. On some projects, Subversion has been been replaced by distributed version control systems such as git and Mercurial. Unit testing and test driven development have moved from extreme practices to standard operating procedure. Continuous integration using products like Hudson and Cruise Control is routine. Nonetheless, most project hosting sites still offer little beyond a source code repository, a bug tracker, and some webspace. Not that that&#8217;s not important, but we can do so much more.</p>
<p>It&#8217;s time to think about what a modern project hosting site might want to offer and what it might look like.<br />
<span id="more-551"></span></p>
<h3>Continuous Integration</h3>
<p>The first step forward, and possibly the hardest, is to add continuous integration capabilities to the existing project hosting repositories. Every time code is checked into SourceForge or Java.net or code.google.com, the project should be built and the tests should be run. Technically, the hard part is understanding every project&#8217;s unique build infrastructure. Some projects use ant; some use make; some use maven; and some roll their own. Maven is probably the most constrained of a lot. For the others, it will be necessary to ask project owners which targets to run for which tasks. It&#8217;s probably a good idea to auto generate basic ant or maven or make scripts for new projects.</p>
<p>Beyond merely building the code, running the tests has some very serious security implications. Currently project hosting sites do not run third-party code. They store it; they display it; they make it available and bundle it up as tar and zip files; but they don&#8217;t even compile it, much less run it. Running arbitrary third-party Java and C code submitted by any random <a href="http://www.jwz.org/doc/cadt.html">teenager with attention deficit disorder</a> somewhere on the Internet is begging for trouble. 10 years ago I would&#8217;ve thought this was insane and impossible. But now, just maybe we can do it.</p>
<p>In fact, there are several services on the Internet today that will run arbitrary third-party code for all comers. Amazon&#8217;s EC2 service lets anybody with a credit card run what amounts to a complete rooted Linux box on Amazon&#8217;s network. Google&#8217;s AppEngine let&#8217;s more or less anyone, credit card or no, run Python and Java code inside Google&#8217;s cloud. And these are hardly the only such services. Advances in virtualization and security sandboxing have made this possible. That said, it certainly helps to have a real user attached to any code that you run so you know who to blame when it starts spamming the world. However when an application goes rogue whether through malice or incompetence, it is possible to shut it down quickly. It is possible to limit the resources used by anyone test suite, and to limit what else I can see on the same filesystem and the same network.</p>
<p><a href="http://bamboo.ci.codehaus.org/start.action">Codehaus uses Atalassian Bamboo</a> to provide continuous integration, including test running, for their projects. However they&#8217;re a relatively small site that&#8217;s somewhat picky about the projects they host. They do use a separate server for the continuous integration. I&#8217;m not sure what other, if any, security precautions they put in place. <a href="https://launchpad.net/">Launchpad</a> builds Ubuntu packages, but I&#8217;m not sure of they run tests. <a href="http://www.javaforge.com/project/11">JavaForge</a> builds Java code and runs the unit tests, apparently on top of Amazon EC2. <a href="http://www.assembla.com/">Assembla</a> will build and run tests, and also uses Amazon EC2. Both of these thereby delegate some of the security issues to Amazon&#8217;s virtualized systems. </p>
<h3>submit queue</h3>
<p>Once we&#8217;ve solved the problem of running continuous integration servers on project hosting sites, the next step is to flip them around. The usual process is to commit code to the repository and have the continuous integration server pull the code out of the repository. Then, if the build or tests fail, the continuous integration server goes into red mode and sends out alerts. Wouldn&#8217;t it be better if the server never turned red in the first place?</p>
<p>What should happen is that new code gets sent directly to the continuous integration server rather than to the source code repository. The continuous integration server pulls the latest known good build from the repository. then it patches the new code into the build and runs the tests. If the tests pass, the continuous integration server commits the code to the repository. If the tests fail, the code is never committed at all.</p>
<p>So far as I know, no current project hosting sites offer this; and it&#8217;s a relatively uncommon feature even among self hosted projects. However, it&#8217;s a critical one, especially when accepting contributions from the wide world of programmers, not all of whom have yet learned the importance of test driven development. I suppose such a site could also perform other checks on the source code. For example, it could verify coding conventions or measure the incremental code coverage before and after the check-in. It could automatically reject any patches that did not meet some predetermined measures of quality. That said, automated checks tend to be better used as additional data for humans to evaluate rather than as hard and fast rules. One way this can happen is by offering code metrics to code reviewers. This brings us to the next improvement in the code hosting ecosystem.</p>
<h3>Code Reviews</h3>
<p>Committing code, even assuming all the tests pass, is still a serious operation. Most open source projects don&#8217;t want to allow just anyone to commit code willy-nilly. Usually there&#8217;s a core group of committers that reviews all incoming patches and decides whether or not to accept them, to reject them, or to send them back for further work. This is somewhat labor-intensive both on the reviewer and the reviewee. </p>
<p>However, if we move to a submit queue-based system, this can become somewhat more straightforward. The continuous integration server can check every incoming patch regardless of the submitter&#8217;s status. If the tests pass, it can send an automatic request for review to a project commiter. If the commiter approves the change, then the continuous integration server can commit it to the source code control repository.</p>
<p>Indeed, it&#8217;s probably a good idea to require code reviews for all submitted changes, not just those from new users. After all, it&#8217;s not like the project&#8217;s owners are immune from introducing bugs. In fact, they probably introduce more than anybody else, if for no other reason than that they commit more code than anyone else. Code reviews are well known for increasing the quality of a code base and avoiding stupid errors, yet they&#8217;re one of the lesser used software development practices among open-source programmers. It&#8217;s time for that to change. Web-based code review interfaces such as Guido von Rossum&#8217;s <a href="http://code.google.com/appengine/articles/rietveld.html">Rietveld</a> have the potential to really move the community forward here. We should integrate this technology or something equivalent into project hosting sites. code.google.com already offers code review, and a few others like BitBucket do too.  The rest should follow. </p>
<h3>One-button deployment</h3>
<p>The final stage of software development is deployment. Eventually the software has to ship to and be installed by its intended users. Here is one area where open source projects have a significantly easier time than a lot of commercial projects, especially enterprise projects. The deployment process for many open source projects consists of little more than uploading a few jar files and some documentation to the right directories on the right Web servers. This should become a one-button operation.</p>
<p>All of project owners should have to do to release a new version is choose a version number and push a button. The server should pull all the code, documentation, and configuration information out of the source code repository; build everything; and put all the finished artifacts in the right locations. No further manual work should be required. This does require that absolutely everything needed to release goes into the repository; not only code but also HTML files, images, config files, and more. The only things that don&#8217;t go into the repository are the artifacts that are built from these components: jar files, zip files, Javadoc, etc.</p>
<p>Maven comes close to this, but it still builds and deploys from a local system rather than from the version control repository. This should be turned around. Ideally, <code>maven deploy</code> should work with nothing more than a pom.xml file. Deploying shouldn&#8217;t need to access the local maven repository or the local copy of the source code at all.</p>
<h3>Summary</h3>
<p>There might even be a startup idea in here somewhere. Open source projects aren&#8217;t the only ones that would like to offload some of the routine system administration tasks involved in running source code control repositories, continuous integration servers, bug trackers, and deployment pipelines. More likely, what&#8217;s really needed are some tweaks in and additions to the existing project hosting services. Or perhaps we can even take advantage of the advances in virtualization technology to install these services on top of Amazon EC2 and similar platforms.</p>
<p>But one thing is for certain: if open source projects are to keep pace with and surpass closed systems, then their software development practices need to be at least as good and probably better than the state-of-the-art in the overall software development community. In order to do that, it&#8217;s time to upgrade our tools.</p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/sourceforge-for-the-21st-century/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>A Square Is Not a Rectangle</title>
		<link>http://cafe.elharo.com/programming/a-square-is-not-a-rectangle/</link>
		<comments>http://cafe.elharo.com/programming/a-square-is-not-a-rectangle/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 10:44:02 +0000</pubDate>
		<dc:creator>Elliotte Rusty Harold</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[inheritance]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[OOP]]></category>

		<guid isPermaLink="false">http://cafe.elharo.com/?p=469</guid>
		<description><![CDATA[The following example, taken from an introductory text in object oriented programming, demonstrates a common flaw in object oriented design. Can you spot it? public class Rectangle { private double width; private double height; public void setWidth(double width) { this.width = width; } public void setHeight(double height) { this.height = height; } public double getHeight() [...]]]></description>
				<content:encoded><![CDATA[<p>The following example, taken from an introductory text in object oriented programming, demonstrates a common flaw in object oriented design. Can you spot it?</p>
<pre><code>public class Rectangle {

  private double width;
  private double height;

  public void setWidth(double width) {
    this.width = width;
  }

  public void setHeight(double height) {
    this.height = height;
  }

  public double getHeight() {
    return this.height;
  }

  public double getWidth() {
    return this.width;
  }

  public double getPerimeter() {
    return 2*width + 2*height;
  }

  public double getArea() {
    return width * height;
  }

}</code></pre>
<pre><code>public class Square extends Rectangle {

  public void setSide(double size) {
    setWidth(size);
    setHeight(size);
  }

}</code></pre>
<p>(I&#8217;ve changed the language and rewritten the code to protect the guilty.)<br />
<span id="more-469"></span></p>
<p>There are actually several problems here. Thread safety is one, but let&#8217;s assume the class doesn&#8217;t need to be thread-safe. Another is that it&#8217;s possible to give the sides negative lengths. That we could fix with a couple of judiciously thrown IllegalArgumentExceptions. However the problem that most troubles me is demonstrated here:</p>
<pre><code>Square s = new Square();
s.setWidth(5.0);
s.setHeight(10.0);</code></pre>
<p>In object oriented programming, it is necessary that a subclass be able to fulfill the contract of its superclass. In this case, that means the square has to respond to <code>setHeight()</code> and <code>setWidth()</code> calls. However doing so enables the square to violate the nature of a square. A square cannot stand in for a rectangle.</p>
<p>You can try to work around this by overriding the <code>setHeight()</code> and <code>setWidth()</code> methods. For example, one might call the other:</p>
<pre><code>public class Square extends Rectangle {

  public void setWidth(double width) {
    super.setWidth(width);
    super.setHeight(height);
  }

  public void setHeight(double height) {
    super.setWidth(width);
    super.setHeight(height);
  }

  public void setSide(double size) {
    super.setWidth(width);
    super.setHeight(height);
  }

}</code></pre>
<p>However this is fundamentally unsatisfying because there is no reasonable expectation that calling one of <code>setHeight()</code> on a <code>Rectangle</code> object will also invoke the <code>setWidth()</code> method or vice versa.  The contract of the <code>Rectangle</code> class is that you can set the width and the height independently, and the <code>Square</code> subclass violates that. Setting width when you&#8217;re setting height is an unexpected side effect. It violates the <a href="http://en.wikipedia.org/wiki/Single_responsibility_principle">single responsibility principle</a><sup><a href="#f1">1</a></sup>. </p>
<p>We could instead just forbid <code>setHeight()</code> and <code>setWidth()</code> completely by throwing <code>UnsupportedOperationException</code>:</p>
<pre><code>public class Square extends Rectangle {

  public void setWidth(double width) {
    throw new UnsupportedOperationException();
  }

  public void setHeight(double height) {
    throw new UnsupportedOperationException();
  }

  public void setSide(double size) {
    super.setWidth(width);
    super.setHeight(height);
  }

}</code></pre>
<p>However, this is really just a louder way of warning the client that the <code>Square</code> class does not fulfill the contract of the <code>Rectangle</code> class. It doesn&#8217;t address the fundamental problem that, in object oriented terms, a square <em>is not</em> a rectangle. The geometric nature of a square is incompatible with the object-oriented definition of a rectangle given above. </p>
<p>There is, however, a way out of this conundrum. Our problem only arises because of the setter methods. If constructor were used instead, and no setters were exposed, then it would be possible to make a square a subclass of rectangle without violating any contracts. For example,</p>
<pre><code>public class Rectangle {

  private double width;
  private double height;

  public Rectangle(double width, double height) {
    this.width = width;
    this.height = height;
  }

  public void getHeight() {
    return this.height;
  }

  public void getWidth() {
    return this.width;
  }

  public double getPerimeter() {
    return 2*width + 2*height;
  }

  public double getArea() {
    return width * height;
  }

}</code></pre>
<pre><code>public class Square extends Rectangle {

  public Square(double size) {
    super(size, size);
  }

}</code></pre>
<p>As long as the <code>Rectangle</code> class is immutable, we can define subclasses that are limited to a particular subset of rectangles, such as squares. This is one more reason to prefer immutability. To the extent possible, define the public interface in terms of what an object is and what it does rather than what you can do to it. However that&#8217;s not always possible, and in those cases you need to be extremely careful around inheritance. Otherwise constraints can be violated when you least expect, thus introducing subtle and potentially dangerous bugs in your code. </p>
<hr />
<sup id="f1">1</sup> Actually the single responsibility principle is usually understood to apply to classes, but it&#8217;s even more critical that it apply to methods. each method should do exactly one thing and one thing only. Side effects should be avoided.  </p>
]]></content:encoded>
			<wfw:commentRss>http://cafe.elharo.com/programming/a-square-is-not-a-rectangle/feed/</wfw:commentRss>
		<slash:comments>45</slash:comments>
		</item>
	</channel>
</rss>
