Love for XML

Sun, 20 July, 2008

XML-bashing is en vogue, mostly for its bloat and wasteful features. The question has been recently inflamed since Google's disclosure of their homespun data interchange format "Protocol Buffers". The criticisms are understandable - the bloat is real, but I suggest that it's just the cost of a data format that tries to be so many things.

Bloat is the price of being human-readable

Not much to explain here. XML can be read fairly easily by machines or humans. Name-spaces reduce readability, but solve another important problem (namespace collisions) that, say, JSON doesn't address.

Root elements and "redundant" end tags are the price of data consistency

It's been said that root elements are unnecessary, and also that spelling out end tags could be replaced with a more succinct </> Please note that those features are there for data consistency.

Imagine if XML didn't require the root element. You're sending a document over a network, but the connection is interrupted exactly at an element boundary. How can you know you received the document?

Likewise, ending element tags can give you some hope of recovering a corrupted document. If the last several bytes of your document is </></></></></>, then how could you know exactly where the corruption took place? This is a data-integrity feature.

I thought we were doing this already?

On a different note, some crappy XML tooling has added to the problem. I've learned recently about a couple of XML parsers with outstanding performance: The D/Tango parser and the assembler parser AsmXml.

The real story here isn't that they're fast, it's how they did it. From the dot not article referenced above:

Using D's array slicing feature, you don't need to create a copy of a string in parsing.
Avoid heap allocation as much as possible

What? All the other parsers in the world were copying strings? All the other parsers in the world were allocating on the heap? I took for granted that the best parsers out there were coded by folks who knew better. How was this so wrong for so long?

And finally, something positive

I enjoy Eclipse's built-in XML editor.

Besides being concise and intuitive, it reminds me that an XML document is a document, not just a text file with tags, as developers are prone to believing.

Summary

In summary, XML pretty much does what it's supposed to in a reasonable way. Please focus your time on improving shoddy tools, and the costs can be greatly reduced.

About Me

Erik Mackdanz is a software developer in Austin, Texas, along with everybody else.

Links