XML 4: Of all the mark-ups in all the WWW, why walk into XML?

HTML has been the darling of the web-authoring world for so long, it'll take something special to knock it of its pedestal

The phenomenon that is the World Wide Web is fuelled by the ability it gives geographically remote web-authors to easily and cheaply distribute documents around the globe. XML aims to provide a basic syntax that allows such information to be shared between different systems running different apps without any need for layers and layers of conversion.

Most documents on the Web, even now, are transmitted in HTML (HyperText Markup Language), a simple language based on SGML (Standard Generalised Markup Language) suited to simple documentation, and hypertext.

HTML applications are limited to a small fixed set of tags in conformance with a single SGML specification. This allows users to leave language specs out of the document and makes it easier to build applications, but this limits HTML in terms of extensibility, structure and validation. HTML users can't specify their own tag sets. They can't support high-end structures such as database sites or objectified hierarchies. Without language specs, HTML users can't check documents are valid when importing/exporting.

SGML can do all these things and more, but as a back-end app unfortunately contains many optional features that web-users don't really require for their needs, and so SGML has proven cumbersome and expensive for browser companies and end-users alike.

XML is expressly targeted at a web-focused audience, although it does have applications beyond the web. XML was designed to be easy and informative to use, but is not backwards-compatible with existing HTML documents. Users who are used to working with HTML however should be able to pick up the basics of XML pretty quickly, and as documents conforming to the W3C (Worldwide Web Consortium) HTML 3.2 specification can easily be converted to XML, this isn't really a barrier to XML's acceptance.

Although many arguments supporting XML do so at the expense of HTML, no-one really believes that the huge volumes of useful HTML pages out there are about to become obsolete in the short term. The W3C has an abiding interest in HTML, as have many of the W3C's. Also the ISO (International Standards Organisation) has standardised HTML in the conviction that HTML will persist for at least 25 years more, which is quite telling. With all its growing pains then, HTML remains a very successful common denominator for building web content.

The apps that will promote XML as the definitive markup language of choice will be those that cannot be undertaken within HTML. Expressly these will be:

  • Applications that need to mediate between two or more heterogeneous databases;

  • applications which require web agents to customise information delivery dependent on the needs of individual users; apps that need to represent different views of the same information to different users (e.g. desktop users, handheld users, kiosks etc...);

  • and those that need to distribute a high load from web server to web client.

    HTML can handle some of these tasks to an extent using proprietary code embedded as "script elements" and delivered with the help of proprietary plug-ins or Java applets in Navigator or Explorer, but it's far from ideal for job.

    One of XML's key selling points is its simplicity. XML gives programmers and authors of sites a friendly environment in which to work. Well, friendly in computing terms... XML documents are built upon a core set of basic nested structures. Although you can take these key elements and create very complicated structures through layering, the underlying objects themselves remain simplistic and understandable to the less than brilliant.

    The obvious aspect of XML to look at is the "X". The language is eXtensible, meaning it can grow and develop as demands require. The initial developer's contribution to extensibility was the provision of determinable tag sets. DTDs (Document Type Definitions) are the most obvious sign of extensibility within XML. At the end of the day, XML is a meta-language, and thus outlines a set of rules that can be employed to create a set of rules for a particular document. DTDs give builders a set of tools with which they can define structure. Flexible yet standard -- a compelling combination.

    XML itself is also still being extended with bolt-ons providing authors with additional styles, linking and referencing. XML can already use many HTML standards like Cascading Style Sheets (CSS) and Hypertext Transfer Protocol (HTTP). XML Linking (Xlink) offers linking facilities that HTML developers can only admire from afar. Xpointers provide a consistent way of reference portions of documents. Extensible Style Language (XSL) provides a more complex tool-set again that that provided by CSS, and uses XML syntax to define style sheets. XML is well supported and is growing at a comforting rate, with more standards on the way.

    But there are still further arguments supporting the adoption of XML at this stage, over HTML or full SGML. Because its documents behave consistently, and it includes support for additional platform independent languages such as Java, third-party APIs, and parsers for C++, C, JavaScript, Tcl, and Python, XML is extremely interoperable. The standard itself is also open, and thus freely available on the Web. Developers could create obscure DTDs or encrypt data, but why bother and lose one of the main benefits of XML? In addition, skilled XML developers in the shape of members of the sizeable SGML community are already out there, ensuring penetration.

    As well as all the advantages for Web already discussed, XML has potential as a universal file transfer format. The adoption of XML in Microsoft Internet Explorer 5 and, presumably, the long-awaited Navigator/communicator revision 5 from Netscape will open a lot of these doors. XML can act as a gateway for communications between disparate systems, platforms and applications. Unless you are web-house with demands that can only be satisfied by full SGML implementation, it's difficult at this stage to envision a scenario in which XML will fail to become your defacto markup of choice and you persevere with or turn back to pure HTML.

    Have an opinion on XML? Tell the Mailroom

    Take me to the XML Special