Love for XML
XML-bashing is en vogue, mostly for its bloat and wasteful features. The
question has been recently inflamed since Google's disclosure of their
homespun data interchange format "Protocol Buffers". The
criticisms are understandable - the bloat is real, but I suggest that it's
just the cost of a data format that tries to be so many things.
Not much to explain here. XML can be read fairly easily by machines or
humans. Name-spaces reduce readability, but solve another important problem
(namespace collisions) that, say, JSON doesn't address.
It's been said that root elements are unnecessary, and also that spelling
out end tags could be replaced with a more succinct </> Please note
that those features are there for data consistency.
Imagine if XML didn't require the root element. You're sending a document
over a network, but the connection is interrupted exactly at an element
boundary. How can you know you received the document?
Likewise, ending element tags can give you some hope of recovering a
corrupted document. If the last several bytes of your document is
</></></></></>, then how could you know exactly
where the corruption took place? This is a data-integrity feature.
On a different note, some crappy XML tooling has added to the problem.
I've learned recently about a couple of XML parsers with outstanding
performance: The D/Tango parser and the assembler parser AsmXml.
The real story here isn't that they're fast, it's how they did it. From
the dot not article referenced above:
What? All the other parsers in the world were copying strings? All the
other parsers in the world were allocating on the heap? I took for granted
that the best parsers out there were coded by folks who knew better. How
was this so wrong for so long?
I enjoy Eclipse's built-in XML editor.
Besides being concise and intuitive, it reminds me that an XML document is
a document, not just a text file with tags, as developers are prone
to believing.
In summary, XML pretty much does what it's supposed to in a reasonable way.
Please focus your time on improving shoddy tools, and the costs can be greatly
reduced.
Bloat is the price of being human-readable
Root elements and "redundant" end tags are the price of data
consistency
I thought we were doing this already?
Using D's array slicing feature, you don't need to create a copy
of a string in parsing.
Avoid heap allocation as much as possible
And finally, something positive
Summary