Basic XML syntax

So what's the big deal?
I once worked with a developer who considered XML to be redundant. He asked me once, "We've already got HTML, which works fine. Why do we need another markup language?" He was, unfortunately, missing the point. HTML is exclusively a presentation language, making it possible, browser incompatibilities and proprietary extensions aside, to view the same data in the same way on multiple platforms.
Despite the fact that most Web browsers are inherently capable of displaying XML (the centrally confusing fact for my friend the confused developer), the language actually has nothing to do with displaying data. Instead, imagine a way to store data and describe the data's context at the same time, and you've got XML.
It's this ability to combine data with information describing its structure that makes XML so incredibly useful as a data exchange technology. For example, take two applications that store data in their own proprietary formats and try getting them to play nice and talk to each other. Most of your time on such a project would be spent designing and coding the mechanism used to transform data from application A's format to that used by application B. XML and its attendant technologies are ideally suited to solve such a problem, with minimal effort on your part.
Your basic XML document
Listing A shows a canonical example of an XML document
describing a list of books. Incidentally, information in XML format is typically
referred to as a document regardless of whether it's actually housed in a
file on disk.
The first thing you'll notice is that XML is tag-based. If you've ever looked at HTML before, it shouldn't be too disturbing for you. Unlike HTML, however, the tags don't necessarily have a predefined meaning. Instead, they are simply markers for data. Here are a few things that might not be evident just from inspecting a document:
Think of an XML document as a tree, with a single root that contains all the other elements. Internet Explorer displays XML documents in this format automatically, as you can see in Figure A:

Opening an XML file in IE can help you visualize the document structure.
Every XML document should begin with a header or prologue defining any additional information needed to make sense of the data described. A long list of optional things may appear here, and if used, they must appear in a particular order. You'll always see at least a version declaration, which must come first:
<?xml version="1.0"?>
The wonderful world of attributes
Still with me? Now, I'm really going to bend your mind and talk briefly about attributes. An attribute is a name and value pair that can be associated with an XML tag to provide additional information about the tag. Here's an example of an attributed tag taken from Listing A:
<book id="bk101">
This snippet defines a unique identifier for the book described by the current
book element. That's what attributes are meant to do—provide additional
information about an element without requiring an element to store that
information.
You may be asking, "So why couldn't you include that
identifier attribute in the book element itself as its own id
element?" And a lot of people would agree with you. Generally, attribute use is
encouraged only when the information modifies the element but isn't specifically
part of the element's content. In this case, the id attribute probably
corresponds to a key in the database table that houses the book information. In
that case, it's not likely to be modified and would probably be needed only when
updating the underlying table, making it a prime candidate for inclusion as an
attribute of the book element. Other uses for attributes come into play
when you get into data validation and transformations.
Data is as data does
I should point out here that XML makes no preconceptions about the data you store in an element, nor the number or order of elements in a document. For instance, referring back to Listing A, there's really nothing to prevent me, troublemaker that I am, from sticking the author's name in the publish_date element. That's because in its most basic form, XML describes only the structure of the data it contains, not the format that data should take.
If you want to enforce some kind of order in an XML document, which is generally a good idea (especially if I'm around), you can provide either a Data Type Definition (DTD) or an XML Schema for your document. Both of these techniques will be the subject of the next article in my remedial XML series.