Plain Text Config Files are Confusing

There’s a large rebellion over XML config files from programmers who don’t like to type XML and don’t want to learn APIs for processing it. They’d rather limp along with the same scanf code they’ve been using for the last 20 years.

The problem is there really isn’t such a thing as a plain text config file. What there is are specially formatted text files that are easily as complex as the XML equivalent but inconsistent, poorly documented, and easily broken. For instance, consider this extract from LogValidator’s “plain text” config file:

##  MailFrom : From: address for e-mail output  ##
##
## Unless the relevant option is specified when running the LogValidator,
## the mail output will use ServerAdmin (see above) as From: and To:
## This option allows you to override the From: parameter
## DEFAULT  = ServerAdmin
# MailFrom: logvalidator@example.org

## Title : a more useful Subject: for the Mail output and <title> for HTML Output ##
##
## Tell the mail/HTML output what this config is all about
## and make them use a better subject than the vanilla "LogValidator results"
## DEFAULT = Logvalidator results
# Title = Logvalidator results

##  [apache] DocumentRoot : where the files are located  ## 
##
## For some log formats, it is necessary to know where the actual files
## reside on the server
DocumentRoot /var/www/

In particular look at the three fields and their format. In the space of three items, we have

Name: Value
Name = Value
Name Value

This is inconsistent and confusing and seems likely to lead to bugs. Perhaps LogValidator should pick one syntax and stick to it? Better yet, just make this all XML. This really is a classic example of why plain text is not simpler than XML.

This is of course just one format for one Perl program. The next program will be a little different still. You’ll have to write a custom parser to handle it, learn a new syntax to write it, and then remember which one you’re using when. With XML all your fields are clearly delimited by tags, so the boundaries are obvious. With XML, you get to use the same parser every time.

XML may not be simpler for any one config format. XML is, however, much simpler for all config formats.


P.S. In the Ruby community, there’s a movement afoot to use Ruby code as the config format. The JavaScript folks are likewise advocating JavaScript config files. That’s a different issue. In both cases, they’re still advocating single, well-defined syntaxes with standard parsers. They are not advocating plain text config files. To the extent that their data formats are only for Ruby or JavaScript programs respectively, this makes some sense. However, for config formats that need to cross language boundaries, XML is still the best choice.

26 Responses to “Plain Text Config Files are Confusing”

  1. leeg Says:

    P.S. In the Ruby community, there’s a movement afoot to use Ruby code as the config format. The JavaScript folks are likewise advocating JavaScript config files. That’s a different issue. In both cases, they’re still advocating single, well-defined syntaxes with standard parsers. They are not advocating plain text config files. To the extent that their data formats are only for Ruby or JavaScript programs respectively, this makes some sense.

    But to the extent that in both cases, the config files are (perhaps) for end-users or systems managers, it would make more sense to use XML for all. That way you don’t have to learn a new programming language just to set up your MTA.

  2. craven Says:

    If I’m maintaining a config file, I can visually scan, edit, and update a non-xml config file faster than a property file. XML adds a lot of ‘visual garbage’ for a person trying to maintain the file.

  3. dbt Says:

    How does this make any sense? I can show you some really bad examples of hard to read and understand XML configuration files… ALL of them. Configuration data should be opaque and maintained by a program (in which case xml, or registry or whatever who cares), or it should be human editable in which case it’s not XML.

  4. Darren Chamberlain Says:

    And then there’s YAML, which many projects (including, e.g., Ruby on Rails) use for config files, and which has a simple, unambiguous, non-verbose format, that is both human-readable and well-defined enough that there should be very little ambiguity about what the components are (unlike the example you’ve given above). It is well-supported and has complete specs.

  5. John Cowan Says:

    I don’t get it. You are complaining about inconsistent formatting in comments?

    LogValidator config files use the same style as Apache: directives are of the form “command args”, or in the case of containers, “…”. No equals signs or colons. I agree that it would be easy to replace this with XML, but the complaint seems ill-motivated.

  6. Tony Cowderoy Says:

    What’s wrong with YAML for config files? I find it more readable than XML, with a lot less syntactical noise, and it is a well defined format that can easily be transformed into XML if you want to. There are also existing YAML parsers for a lot of languages. That’s not to say that XML is a bad choice, but I often find XML config files hard to read.

    Also, in many cases the bigger problem is not so much that the files are in XML but that there is a vast number of configuration options with a poorly chosen set of defaults and poor documentation. Are we confusing a problem of syntax with more fundamental design problems?

    Ultimately, with plug-in parsers, it shouldn’t be very hard to offer a choice of syntaxes – how about that for a way to please most of the people most of the time?

  7. Elliotte Rusty Harold Says:

    John,

    My bad. An unescaped less than sign in the comment hid part of the article so you missed the meat. Setting up LogValidator requires these three options (among others)

    MailFrom: logvalidator@example.org
    Title = Logvalidator results
    DocumentRoot /var/www/

    Notice that each option has a different syntax. In at least these two cases LogValidator config directives do use equals signs and colons, and in other cases they don’t.

  8. BuggyFunBunny Says:

    BUT, that’s the fault of inconsistent specification of configuration, not the file format.

  9. Brad Huffman Says:

    Along with a consistent format, XML comes with the ability to provide stylesheets to transform the configuration files into easy to read and useful documentation. All without anyone having to modify the original program or write any additional tools. XSLT processors exist today for almost every platform! In addition, there’s the ability to intermix you own tags along with those already in the configuration. You can specify not only how things are set, but document why you set them that way in the first place. This might not seem important when your user base consists of yourself, but as your user base grows, it becomes imperative that you try and ensure a smooth transition to the next person. It’s just the professional thing to do. Whether XML is the best format the task isn’t that relevant. It’s available today, across multiple platforms, and requires a minimum set of tools. That makes it a good choice.

  10. Hammer&Nail Says:

    My bad. An unescaped less than sign in the comment hid part of the article so you missed the meat.

    I have seen the above happening on several production sites where the configuration file is a convoluted over blown XML.

    Also to your argument against simple NVP based configuration file, you cite the example of LogValidator. It is possible that the LogValidator team did not think through their configuration design and are ending up having a spaghetti configuration which you now use to buttress your arguments. It is possible that their configuration file needs to be XML based, but that argument cannot be carried over to *all* applications out there. Some applications are better off with a simple name value pair based configuration!

    In this case, your argument reminds me of the Hammer and the Nail cliche’.

  11. <xml><config-files><considered>abusive</considered></config-files></xml> Says:

    “What there is are specially formatted text files that are easily as complex as the XML equivalent but inconsistent, poorly documented, and easily broken”

    Yes and the idiot that decided to subject his users to this will achieve the same results with XML.

    If your chosen language doesn’t make it easy to robustly parse non-xmlian config files then maybe you chose the wrong language for the job.

  12. Peter Says:

    At my last job all our server stuff was configured using tcl. I think the idea was to have a simple consistent configuration file format across the organisation but to give it enough power to easily support automation. Seemed to work well enough for our hundreds of server types and 10s of thousands of machines.

  13. Javaner Says:

    You are right “plain text” ist not a usable config file, just because there are no rules. But XML on the other hand is far from being easy human readable and who need hierachical tags when all you whant to store are key-value pairs? In my opinion simple property-files like those used by java or those windows ini-files are best usable. You get the desireded “just one format” for assignments, without inserting unnecessary and distracting tags.

  14. Elliotte Rusty Harold Says:

    Are simple property file like those used by Unix and Windows really so simple? Here are a few questions to consider:

    1. Do they work if there’s no line break at the end of the document? Or if there is?
    2. Can the system handle non-ASCII values or names? If so, what encoding does it use?
    3. How do you provide a multiline value?
    4. Is there a maximum line length? What is it?
    5. Can you replace a space with a tab? or vice versa?
    6. Must there be a space between the colon that separates the name from the value?
    7. Are uncommented blank lines allowed?
    8. How are comments indicated?

    I have seen “plain text” config formats that answer every one of these questions in every possible way. When presented with any new format, you have to figure out all these things, usually by experiment because:

    1. The details are undocumented.
    2. The developers didn’t actually consider any of these questions, or use a real parser, so the answers are just whatever their regexp code accidentally does.

    When presented with an XML format, you have to figure out none of these questions. The answers are already known.

    Plain text config files are simpler only in isolation. While any individual plain text config format may be simpler than the equivalent XML format, the sum-total of complexity of all plain text config formats is much greater than the
    sum-total of complexity of the equivalent XML config formats.

  15. Ed Davies Says:

    Javnar says “…and who need hierachical tags when all you whant to store are key-value pairs?”

    It often happens that configuration files grow to contain more structured data even if they only contain key-value pairs in the initial implementation. The values become lists of values or references to other keys or keys become grouped into sections and sub-sections and so on. At this point use of a more structured syntax, whether it be JSON, YAML, XML, N3 or whatever, becomes worthwhile at the overall logical level as well as at the nitty-gritty representation level. Perhaps it’s better to start with the structured syntax and avoid the later temptation to push the flat text form too far in order to keep backward compatibility. Been there, done that – with CSV files which really should have been XML or something.

  16. John Cowan Says:

    Anyone who wants to think about this issue seriously should begin by reading the section “Data File Metaformats” in The Art of Unix Programming. You may not agree with the condensed lore represented here, but it’s foundational. And yes, CSV sucks.

    One of my back-burnered projects is AltoSAX, a highly configurable parser for plain-text config files that makes them deliver SAX events and so appear as simple XML to applications.

  17. Hammer&Nail Says:

    A simple configuration file moving to a not so simple configuration file over an application’s life cycle is more likely a symptom of a design problem in the application. It is better to revisit the application design at that stage. Or put it other way, configuration file is a resource and not a kitchen sink junkyard for things which the programmer has not figured out where to fit. Treating it as a resource is the solution.

    Also reading from the above responses, the argument for an XML based config file is going the way of “4 spaces vs. 2 spaces vs. tabs” argument!

    Is Elliotte trying to pick our brains to see how an ideal configuration looks like? Here are some of my points:

    1. Easily readable, understandable and “operatable”. Think of the overworked IT operator or an overwhelmed user trying to modify the config file.
    2. Understand your target audience!
    3. Minimalistic. Do not make it a kitchen sink!
    4. Think building blocks.

  18. Brad Huffman Says:

    0. Documentation!

    Why, O why is this one always left out.

  19. Adrian Says:

    >>“What there is are specially formatted text files that are easily as complex as the XML equivalent but inconsistent, poorly documented, and easily broken”

    > Yes and the idiot that decided to subject his users to this will achieve the same results with XML.

    No, they can’t unless they write their own XML parser: XML has a specification, so it’s always going to be consistent.

    >If your chosen language doesn’t make it easy to robustly parse non-xmlian config files then maybe you chose the wrong language for the job.

    I think the point here is that plain text config files *are* the wrong language for the job.

  20. Adrian Says:

    >A simple configuration file moving to a not so simple configuration file over an application’s life cycle is more likely a symptom of a design problem in the application. It is better to revisit the application design at that stage.

    What? Either you want/need to expose a lot of configuration or you don’t. Revisiting your design isn’t going to change that.

  21. Labnotes » XML configs files are your competitive disadvantage Says:

    […] February 27th, 2007 Elliotte Rusty Harold: “There’s a large rebellion over XML config files from programmers who don’t like to type XML and don’t want to learn APIs for processing it. They’d rather limp along with the same scanf code they’ve been using for the last 20 years.” […]

  22. David Smith Says:

    I’m with Eric Raymond on this one – if humans maintain the file, it should be human-readable. Saying that there are ways to mangle XML so that humans can read it is very different from saying that they are “human readable”.

    Suggesting that every implementation of this mangling does and always will produce the same result is touchingly naive. I can pop XEmacs 21 today and read archived 25-year-old dotfiles from one of my System V user accounts and know that I’m seeing exactly the same text that I created in vi in 1982. Predicting that the XML parser of the moment in 2007 will produce exactly the same results as the XML parser of the moment in 2029 is a leap of faith. Call me old fashioned, but it’s worked like a hot damn for a long time.

    I’d also suggest that if you’re going to make a point about ascii text config files, you use one from a mature, well-written platform like almost anything in a BSD /etc directory, sendmail or Apache. None of them are “any-human-off-the street-readable” but they work wonderfully for a qualified admin, are readily searched and manipulated by a raft of different tools, and can easily be created and/or maintained programmatically.

    I’m a huge fan of XML as a way to transfer data between disperate systems – I’m just not willing to make it the only tool on my belt.

  23. Philippe Lhoste Says:

    > In both cases, they’re still advocating single, well-defined syntaxes with standard parsers.
    > They are not advocating plain text config files.
    > To the extent that their data formats are only for Ruby or JavaScript programs respectively, this makes some sense.

    Some mentioned YAML.
    Well, there is also Lua, the programming language, initially designed as a configuration language, and evolved to be a powerful multipurpose language, but it still well suited to data description.
    The nice part is that it isn’t much bigger than an average XML parser, and can be easily integrated in another (compiled) language (C, Pascal, even Java) which can then access the result of parsing the configuration file by Lua.

  24. Jonathan Feinberg Says:

    The idea that using XML somehow gives you some canonical way of expressing settings is a kind of a strawman. Consider:

    or

    or

    or

    or

    gnorf

  25. David Myers Says:

    I’ve been server support for large projects that indulged in overly complex undocumented XML file configs and what that tends to do is give the project developers another role: second tier production support.

  26. she Says:

    Personally I have switched to Yaml for 90% of my projects.

    It works nicely. Yaml is no panacea but it works nicely, is a LOT more readable than XML, and
    does the job fairly well. I’d wish all scripting languages would support (and use) YAML like Ruby does,
    this way I can know that all my yaml files will work in other languages too.

    Instead of embedding info in a .rb file, I am embedding the info in a yaml file, which is a lot better to
    not tie yourself to only one language. Think if instead of C .h files they would use yaml and provide nice
    APIs to use from ruby easily… i am dreaming 🙂