XML 1: What is XML by the way? A look at the Web "Lingua Franca"

We've all heard that XML is the next big thing for the web, but the many white papers available out there are a little intense. Here's a straight-forward explanation on the web language of love...

XML stands for eXtensible Markup Language. It is a subset of SGML, Standard Generalised Markup language, and was designed by the W3C to make it easier for Internet users to interchange structured documents.

Structured documents are documents that contain content -- including words and pictures -- together with explanations as to what role each piece of content plays. XML files clearly indicate where the start and end of each logical part of a given document occurs. This means for example, that content in a header, introduction, photo caption, conclusion, table or whatever, can be given a different meaning and way of acting and displaying itself than each other element.

This is one major reason why XML is considered to be more personal and intelligent than HTML, for example. XML was designed from the ground-up NOT to code text in a standard way. Indeed it is fair to say that no language can truly suit all applications. So XML simply describes the component parts of the document and gives that information to other computer systems. This means that XML is inherently flexible, making it ideal for describing any block of content at pretty much any scale, from a mail or news story, to an encyclopaedia or entire database.

As mentioned, XML is defined as an application profile of SGML (ISO 8879). SGML has been the standard, vendor-independent definer of structured documents since the eighties, but it is an unwieldy tool for serving documents over web. A great tool, and very powerful, but you wouldn't use a chainsaw to open an envelope, and this is where XML comes into play. Any SGML conformant system can read XML documents, but XML documents don't have to have a full understanding of the inner-mechanisms of SGML. In essence then, XML is a restricted version of SGML.

One of the coolest things about XML is its liberality and lack of semantics. If you want to embolden a sentence you could call the tag "nice and bold" if you so wish. This is extremely useful when developing style sheets and so forth, and allows for feature rich documents with complicated structures to be created reasonably simply. HTML is easier at a basic level, but you have to learn a tag-set and how those tags are delimited from normal text and in which order they may be used. XML systems can be defined to make the user aware of valid tag choices at any given stage of a document, which can be manually or dynamically validated. Users familiar with HTML will be pleased to see tags used in angle brackets.

XML documents are made up of a series of elements, things, or objects. The language works as a formal syntax for describing the relationship between these objects that make up the document, and can be used to tell the computer how to deal with each. Because XML tag sets are more logical in structure they are perhaps easier to understand than basic mark-up schemes.

When the W3C decided to knock up XML it had Ten Commandments, or more correctly "development goals" that it wanted to achieve. XML is a little easier to grasp when you know what it was intended to help you do.

1) Firstly, it had to be easy to use over the Internet, with documents that could be as viewed as easily as existing HTML efforts.

2) It had to be able to support a wide variety of applications for authoring, browsing and so on.

3) It had to be compatible with SGML, so as not to put out the many development houses with huge resource investments in SGML.

4) It should be easy to write programs that process XML documents, with an average programmer being capable of bashing out such a program in around a fortnight.

5) Options were to be kept to a minimum, and hopefully zero, so as not to create unnecessary compatibility errors between disparate users and systems.

6) XML documents should be "human" and clear, so that you could view XML source code in an everyday text-editor and still figure out its basic purpose.

7) The whole XML design process had to be completed quickly, before the problems it was intended to solve built up and became more complicated problems.

8) The design had to be formal and to the point, unlike SGML, which is overtly complicated to the untrained eye.

9) Documents must be easy to create without sophisticated editors.

10) And lastly, it was decided that terseness in XML markup was unimportant, with the belief that clarity was preferable to less characters, especially when shortcuts can easily be implemented.

Sounds like a cool language, doesn't it? Well, in many ways it is.

XML is easy to maintain, as there is no sprawl of incomprehensible mass of markup to wade through when troubleshooting documents. Unnecessary complication has been torn out of XML, leaving an elegant, human development environment that is open and universally compatible. Above all XML is friendly, a rare quality in the world of programming.

Have an opinion on XML? Tell the Mailroom

Take me to the XML Special