XML 2: The XML 1.0 specification is here. Does it cut the mustard?

After the hype that followed the first rumours of an XML standard, the W3C had to build a spec that satisfied a hungry audience of would be developers. How did it fair?

The story of XML as a standard goes back to early 1996. Although HTML had been the darling of a new breed of Internet programmers for a number of years, it was becoming increasingly obvious that it was reaching the limits of its power, and was basically getting on a bit. The Internet is a ruthless arena for even the most popular standards.

So the online community decided it wanted to achieve different goals with the next generation electronic document format. It was decided that the new standard should lose none of the simplicity, GUI appearance and hypertext talents of HTML, while having the added benefit of enabling automation of multi-vendor applications.

The World Wide Web Consortium, or W3C launched the W3C XML Activity project in May 1996. The working group that designed and tweaked XML comprised an interesting mixture of publishing-industry veterans and Web pioneers, working from the privileged position of strength that a vendor-independent body enjoys. This small working group (The XML Working Group) was also helped out with technical input from a larger Special Interest Group, or SIG.

Following ten design goals and passing through a succession of interim drafts, XML reached the 1.0 Recommendation. The W3C was particularly proud of the fact that all technical discussions and decisions were taken via teleconference, email and web-postings, with very little face-to-face interaction. The group believes that not only did this allow worldwide members to play an important role in development, it also sped up the normally lengthy specification process as a whole. Since version 1.0, there have been a number of requests for enhancements to the spec, but the working group is reluctant to make any radical changes before there XML is deployed more widely and greater hands-on experience has been gained.

XML.com wisely suggests that if you want to understand XML, you really have to read the specification for yourself. But as Tim Bray, co-editor of the XML 1.0 specification himself once said, most people never get round to reading the basics of how to operate a toaster safely, so ploughing through a technical spec is probably not for everyone. But this specification is only 40 pages long, and is available all over the web in a number of different formats, thanks to its XML authoring, so it's definitely worth a look.

Luckily, Bray has written a number of idiot-friendly papers with potted explanations for various aspects of XML 1.0, something that I for one was very pleased about.

I don't think that it's unfair to say XML 1.0 was something of a hit, but to be honest the demand for an HTML successor was such that it could be nothing else. XML 1.0 was designed from the ground up to do things that HTML never could. HTML is great for displaying text, but for automating Web processes you need something like XML. It gives you the ability to make rich documents that are open to manipulation from computer programs. For example, a web-robot could be employed to index items, or a Java applet used to push content into graphs, tables for example.

It also managed to take away the rigidity of HTML tagging. With XML 1.0 it's up to the user how elements are specified. Tags like "chart-position" or "goals-scored" are common place in XML documents but will never feature in HTML. You may be concerned that you'll have to knock up a new range of tags each time you write or share a document, but the specification includes something called a Document Type Definition, or a DTD, that allows you to define the tags you've made for future use by you or anyone else you want. A document that conforms to a DTD (if it has one) is called "valid".

As well as valid, XML documents have to be "well-formed". This means all the tags begin and end correctly (apart from empty elements), all attribute values are correctly quoted, and all entities are declared. This is a fab idea. Surf the web and you will find a lot of crap HTML out there, with unclosed tags, broken links and so on. This makes automated processing extremely unreliable, as you can't be sure that all documents will comply with the same rule sets. Well-formed documents are easy to parse -- that is manipulate and structure.

Browsers are very forgiving of bad HTML, but the XML 1.0 spec clearly states that if a document is not well-formed, then it will not exist. The committee decided that it was easy to use good XML practice, so if writers couldn't be bothered, tough. The controversial decision was pushed through primarily by Netscape and Microsoft, and these rules mean that once navigator or Explorer displays an XML page, you know it's well-formed.

The most complimentary thing that can be said about the XML 1.0 specification is that it works and does everything the W3C set out for it to achieve. But then it was knocked up by a load of old hacks who loved SGML and had a vested interest in making the web a more dynamic, friendly place to code.

Have an opinion on XML? Tell the Mailroom

Take me to the XML Special