June 5th, 2009
The fundamental reason to prefer draconian error handling is because it helps find bugs. I was recently reminded of this when Peter Murray-Rust thought he had found a bug in XOM. In brief, it was refusing to parse some files other tools let slip right through. In fact, XOM’s strict namespace handling had uncovered a cascading series of bugs that had been missed by various other parsers including Xerces-2j and libxml.
But before I describe what happened, let’s see if you can eyeball this bug. I’ll make it easier by cutting out the irrelevant parts so you know you’re looking right at the bug. Here’s the instance document we start with:
<!DOCTYPE svg SYSTEM
"http://www.w3.org/TR/2000/03/WD-SVG-20000303/DTD/svg-20000303-stylable.dtd">
<svg/>
And the referenced DTD is:
<!ENTITY % StylableSVG "INCLUDE" >
<!ENTITY % ExchangeSVG "IGNORE" >
<!ENTITY % SVGNamespace "http://www.w3.org/2000/svg-20000303-stylable" >
<!ENTITY % Shared PUBLIC "-//W3C//DTD SVG 20000303 Shared//EN" "svg-20000303-shared.dtd" >
%Shared;
Then in svg-20000303-shared.dtd we find this:
<!ATTLIST svg
xmlns CDATA #FIXED "%SVGNamespace;"
%stdAttrs; >
Not obvious, is it? In fact, I looked at this one for quite a while, and consulted several spec documents before Tatu Saloranta figured out what was actually wrong here. If it helps the relevant part of the XML specification is Section 4.4, XML Processor Treatment of Entities and References.
Give up? OK. Here’s what’s happening:
Read the rest of this entry »
Posted in XML | 9 Comments »
May 27th, 2009
A couple of weeks ago I spent a considerable amount of time chasing down bugs involving null in a large code base: null checks after a variable had already been dereferenced, nulls passed to methods that would immediately dereference them, equals()
methods that didn’t check for null, and more. Using FindBugs, I identified literally hundreds of bugs involving null handling; and that got me thinking: Could we just eliminate null completely? Should we?
What follows is a thought experiment, not a serious proposal. Still it might be informative to think about it; and perhaps it will catch the eye of the designer of the next great language.
Read the rest of this entry »
Posted in Programming | 37 Comments »
January 16th, 2009
Sometimes I still feel like we’re in 1982 when it comes to really basic things like turning off a computer. Why do we have to select shutdown from a menu? Why do we have to carefully save each open file? Why don’t programs stop when we tell them to? (Time Machine has now been spinning for hours, and won’t stop even though I’ve told it to.) Why is this so much more complex than it needs to be?
In the future, here’s how shutdown should work:
- You flip the power switch.
That’s it. No shutdown menu item. No wait for the system to hibernate. No opportunity for applications to save data. Nothing.
Read the rest of this entry »
Posted in User Interface | 48 Comments »
January 12th, 2009
I’m doing a bit of work on XOM, trying to optimize and improve some of the Unicode normalization code. A lot of this is autogenerated from the Unicode data files, and I’m actually working on the meta-code that parses those files and then generates the actual shipping code. In this code, I’m setting up a switch
statement like this one:
switch(i) {
case 0:
return result + "NOT_REORDERED";
case 1:
return result + "OVERLAY";
case 7:
return result + "NUKTA";
case 8:
return result + "KANA_VOICING";
case 9:
return result + "VIRAMA";
case 202:
return result + "ATTACHED_BELOW";
case 216:
return result + "ATTACHED_ABOVE_RIGHT";
case 218:
return result + "BELOW_LEFT";
case 220:
return result + "BELOW";
case 222:
return result + "BELOW_RIGHT";
case 224:
return result + "LEFT";
case 226:
return result + "RIGHT";
case 228:
return result + "ABOVE_LEFT";
case 230:
return result + "ABOVE";
case 232:
return result + "ABOVE_RIGHT";
case 233:
return result + "DOUBLE_BELOW";
case 234:
return result + "DOUBLE_ABOVE";
case 240:
return result + "IOTA_SUBSCRIPT";
default:
return result + "NOT_REORDERED";
}
And then I stop myself. Do you see the bug? Actually it’s a meta bug that leads to the true bug.
Read the rest of this entry »
Posted in Programming | 8 Comments »
January 1st, 2009
C-family languages including Java, C#, and C++ do not require braces around single line blocks. For example, this is a legal loop:
for (int i=0; i < args.length; i++) process(args[i]);
So’s this:
for (int i=0; i < args.length; i++)
process(args[i]);
However both of these are very bad form, and lead to buggy code. All blocks in C-like languages should be explicitly delimited by braces across multiple lines in all cases. Here’s why:
Read the rest of this entry »
Posted in Programming | 23 Comments »