XML (eXtensible Markup Language) is a language that can be used to structure data in order to make it far easier to manipulate and exchange between applications.
In order to fully understand XML we'll need to do a little history. In the mid-1980s a powerful and complex language called SGML (Standard Generalised Markup Language) was developed. All we need to know about SGML is that it is primarily used to define the structure of data in documents for a wide range of applications which require large amounts of data to be published in properly formatted documents (maintenance manuals for aircraft being a perfect example).
SGML is what is known as a "meta-language" -- a language that could be used to define other languages, such as HTML (Hyper Text Markup Language). HTML is a very simple language that is used to define documents which will be published on the web. HTML simply consists of a limited number of tags that describe the way text and images should look in a web page. (For the purposes of this discussion, all angle brackets that would normally be used to make up both HTML and XML tags have been replaced with everyday curved brackets.)
- Example HTML tags are:
(B) (/B) These tags denote that any text in between them should be displayed as bold.
(CENTER) (/CENTER) These tags are used to display text or images in the centre of the screen.
(IMG SRC="URL") This tag tells the browser to display an image file, the location of which is given as a URL inside the quotation marks.
Now, HTML is fine if you simply want to display text and images on web pages, but it's dumb -- it just sees text and images but doesn't know the value of that data or how to manipulate it usefully. The current trend towards applications such as e-commerce means that we need something a little more flexible to help us manage the vast amount of data on the web. SGML might be useful, but it's a very large and unwieldy language and would be too complex for most applications. So, the various bodies and organisations which control such matters last year came up with XML which in essence is a subset of the SGML language.
A system capable of reading SGML documents will be able to read XML documents, but not necessarily vice versa. It's better suited to use on the web than SGML and it's far more powerful than HTML. Rather than simply being a set of pre-defined tags that are processed by a web browser, XML allows people to develop their own sets of tags which can be used to describe data.
- For example, a HTML page on a motor dealer's web page might look like this:
(P)Ford Mondeo, blue, 3000 miles only, £7500(/P)(/BODY) (/HTML)
- A similar page written in XML might look something like this:
(TEL)0171 123 456(/TEL)
As you can see, the XML page uses a set of tags that are specific to the car trade to carefully define each piece of information. The consequences of this for the end user are easy to imagine -- think how much easier it would be to locate a blue Mondeo for sale in the London area if the data is stored in XML rather than HTML.
But this does raise a problem: For the given example to work effectively, the car industry would first have to agree to use the same set of tags and agree on a format for documents using those tags. To do this we have to create a schema, which is basically a set of rules that define how documents will work for specific applications, be they car sales or patient information files for medical purposes. Because XML allows us to create rigidly formatted data, it's perfect for use in applications where we want to process data automatically, such as online commerce.
This however doesn't begin to scratch the surface of the possibilities raised by XML. The functionality of web based applications and the ease with which they can be implemented is greatly improved by XML. Combining content from heterogeneous data sources onto a single web page is much easier to do than in HTML, as is pumping out data to a wide range of devices such as web browsers, mobile phones, set top boxes and so on.
The key idea behind XML is that it means when you put data onto the web, that data is formatted and described in such a way that it becomes easier to manage both for those who create it and those who want to access it. It's an exciting development for all web users because it will vastly improve the way the web works by structuring the vast amount of content in a much more useful manner.
Have an opinion on XML? Tell the Mailroom
Take me to the XML Special