20 Questions on XML
Nearly everyone is talking about XML (Extensible Markup Language) these days. But few understand what it's really all about.
XML's proponents claim it will cure everything that's wrong with HTML and enable the seamless exchange of data between different applications and operating systems. Indeed, most observers agree that XML is poised to cause a revolution in content delivery and exchange. And those of you who get it first will probably benefit the most.
If you've been dying to learn more about XML but have been unable to find the time to dig through all the technical specs and marketing hyperbole, we have the perfect answer: let Builder.com do the digging for you. We've asked and answered the top 20 questions about XML, and while they may not make you an XML expert, we guarantee they'll give you a better appreciation of just what this technology has in store for Web builders.
What is XML?XML stands for Extensible Markup Language. Spearheaded by the World Wide Web Consortium (W3C), XML became a formal specification in mid-
XML developers will tell you that XML isn't a language but rather a system for defining other languages. You may have already heard of, or even used, one of these other languages--
The W3C, which is working on a slew of XML-
By separating structure and content from presentation, the same XML source document can be written once, then displayed in a variety of ways: on a computer monitor, within a cellular-
So XML will have a life outside of the Internet, serving the publishing industry at large, for example, and especially people who produce documents intended to appear across multiple media. Some large-
The DOM
XML's real strength for the Web is how it interacts with the Document Object Model (DOM), an interface that defines the mechanisms for accessing data in a document.
Using the DOM, programmers can script dynamic content in a standardized way. In other words, they can use it to cause a specific piece of content in a browser's document tree to behave in a certain way, creating a small effect--
The saying among Web heads is that content is king. Unfortunately, too often that content is intimately tied to how it's displayed. How many times have you come across a Web site with a little disclaimer saying "Best viewed at 800-
XML will help solve that problem because, rather than specifying where to display something, Web builders will be able to specify the structure of the document. For example, you can specify the document's title, its author, a list of related links, and so on. Then any device with an XML browser--
Perhaps XML's best feature, though, is its inherent extensibility. Companies and organizations will be able to extend XML to meet new challenges and applications. One XML-based language is already in use--
XML also holds the promise of becoming a standardized mechanism for the exchange of data as well as documents. For example, XML may become a way for databases from different vendors to exchange information across the Internet.
It's still too early to determine precisely where XML is heading. But the possibilities are awesome, a big reason why there's so much excitement surrounding XML. How are SGML, HTML, and XML related?
Standard Generalized Markup Language (SGML) is a way of expressing data in text-processing applications. It's been around for more than a decade; both XML and HTML are document formats derived from SGML. Thus they all share certain characteristics, such as a similar syntax and the use of bracketed tags. But HTML is an application of SGML, whereas XML is a subset of SGML.
The distinction is important. Basically, HTML can't be used to define new applications, but XML can. For example, both the Resource Description Format (RDF) and the Channel Description Format (CDF) are applications that were defined using XML. XML and HTML are really more like cousins than siblings. (The W3C has developed a great diagram to help clarify this relationship.)
See the diagram
XML is actually compatible with SGML:
HTML, SGML, and XML will continue to be used where appropriate; none of them will render the others obsolete. HTML will remain the simplest way to publish data quickly on the Web, mostly short-
How will XML be implemented?
XML will be used in a couple of different ways. One is for data interchange between humans and machines, such as from a Web server to a user's browser. The other is for data exchange between applications, or from machine to machine.
In either case, you'll likely require a three-
Today, Web pages are sometimes delivered this way;
A Document Type Definition (DTD) is a set of syntax rules for tags. It tells you what tags you can use in a document, what order they should appear in, which tags can appear inside other ones, which tags have attributes, and so on. Originally developed for use with SGML, a DTD can be part of an XML document, but it's usually a separate document or series of documents.
Because XML is not a language itself, but rather a system for defining languages, it doesn't have a universal DTD the way HTML does. Instead, each industry or organization that wants to use XML for data exchange can define its own DTDs.
If an organization uses XML to tag documents for internal use only, it can create its own private DTD. The Wall Street Journal Interactive Edition, for example, has a DTD specifying each edition, with information about pages, articles, summaries, bylines, and so forth. The Journal currently uses an SGML DTD (called the Dow Jones Markup Language), but it is developing an XML version as well.
DTDs are not free from controversy. While some people feel they add substantial value in business, others feel they constrain creativity. Still others think they're useful but don't go far enough. Microsoft is attempting to address this last complaint with its XML-Data proposal, but critics say these improvements should be made within the DTD specification itself.
Microsoft's schema
A group of vendors including Microsoft has proposed an alternative approach to the DTD called a schema, which they have submitted to the W3C as XML-Data. Like a DTD, a schema provides the rules of a document and indicates what tags are used, what their attributes are, the relationships between the tags, and so on.
Unlike DTDs, however, a schema can define data types. For example, a DTD might have a tag designated as <PRICE>, but the content contained within that tag could be a number or a character string. A schema could force you to enter a number.
This approach clearly has benefits, especially for data exchange among applications, objects, or databases. The only question is whether this approach will somehow be rolled into the DTD specification or end up as a separate extension to XML. What are well-formed and valid documents?
There are essentially two related types of XML documents: well-
See how to create well-formed XML
Valid XML documents are documents that conform to a specific Document Type Definition (DTD). Confirming the validity of XML documents is largely the work of authoring and publishing tools, whereas XML-
A tool for reading XML documents is popularly called an XML parser, though the more formal name is an XML processor. XML processors pass data to an application for authoring, publishing, searching, or displaying. XML doesn't provide an application programming interface (API) to an application, it just passes data to it. No XML processor will parse data that isn't well-
The XML developer community makes available free XML readers and parsers for use in applications or XML authoring software:
- Textuality's Lark, from one of the co-editors of the XML specification.
- Microstar Software's Ælfred, a Java-
based parser. - DataChannel's DXP, formerly the well-
known NXP, or Norbert's (Mikula) XML Parser, to which APIs have been added.
If XML is the ability to speak a language, XML applications are specific languages. Resource Description Format (RDF) is one such XML application: a data-
RDF is a way of describing and accessing data. That means RDF is data about data, or metadata. In the case of the Web, this metadata will be applied to creating standardized site maps, more precise search results, and hierarchical topic indices.
RDF also allows for intelligent bookmarks that change as the Web pages being referenced change. This is very useful if you're tracking a site whose content is regularly updated, such as CNET's News.com.
It won't be difficult for Web builders to create metadata regarding their Web site content that can be referenced by search engines. We'll soon have access to commercially available software that automatically produces an RDF file of a given site.
XML metadata will also energize the market for companies whose business is to describe and rate information. There are many ratings bureaus springing up on the Web, bureaus that rate everything from kid-safe sites to the best movie or wine sites. RDF describes the syntax the ratings bureaus can use. People will choose the ratings bureau whose vocabulary they're most comfortable with--where vocabulary refers to the particular set of terms the bureau uses to rate different types of content--from sex and violence to wine acidity.
How will Netscape implement XML in its browser?Netscape will support XML metadata in Communicator/Navigator 5.0 as a delivery component code-
Aurora finds and manages information across networks, desktops, and databases. It will appear on the desktop as a "windowpane" menu interface that pulls together pointers to resources relating to current projects, research topics, or regular activities. RDF lets the Aurora navigation bar point to local files of varying data types (word processing documents, spreadsheet data, email messages, database content), as well as to resources on Internet or intranet servers (search and query results, bookmark links, and so on).
An XML parser that reads RDF will be part of Netscape's 5.0 browser and is expected to be available in one of the developer beta releases before the final product ships. Beyond this initial RDF implementation, Netscape is planning to include a generalized XML parser in its browser that would work with other XML applications such as the Shakespeare markup (an early XML application), Chemical Markup Language (CML), and MathML, a mathematical markup language that is in the process of becoming a W3C Recommendation.
"We want to turn Navigator into an XML platform," says R.V. Guha, Netscape principal engineer. Guha originally developed MCF (MetaContent Format), which has since been folded into the RDF specification. How does Microsoft implement XML in its browser?
Microsoft's Internet Explorer 4.0 was the first Web browser to implement XML. Microsoft offers a pair of XML processors: a parser written in C++ that comes with the browser, and source code to a Java parser that Web builders can download and incorporate into their applications.
The Java parser is a validating parser, meaning it checks against a Document Type Definition (DTD) or schema. To improve performance, the C++ version that comes with the browser is a nonvalidating parser.
According to Steve Sklepowich, Microsoft's XML product manager, both parsers are "generalized" in the sense that they aren't dependent on specific XML applications such as the Channel Definition Format. Since XML data is separate from its presentation, the ability to actually display XML natively in a Web browser requires a style sheet, such as XSL (Extensible Style Language).
In the meantime, Microsoft uses what it calls the XML Data Source Object, or XML DSO. This model uses the data-
Microsoft also uses the XML Object Model (XML OM) to let developers interact with XML data in the browser. It does this through a method of exposing HTML as objects based on the Document Object Model (DOM), though HTML and the DOM aren't directly compatible. The DOM lets scripts and programs access structured XML data.
While the current focus of XML at Microsoft is on the browser, XML will eventually show up "anywhere that HTML has shown up," says Sklepowich. CEO Bill Gates has stated publicly that future versions of Microsoft Office will support XML, and the company also plans to support the standard in email packages and XML-
Channel Definition Format (CDF) and Open Software Description (OSD) are XML applications championed by Microsoft. With its XML parser, Microsoft's Internet Explorer 4.0 reads CDF files to drive and control collections of pages that come together in push channels. In light of work done with the Resource Definition Format (RDF), the CDF proposal was recently resubmitted to the W3C to take advantage of RDF's ability to show relationships between various data elements.
See CDF code sample
Open Software Description is the vocabulary used to describe software components, tagging with syntax such as dependency, version, and platform. OSD describes how to advertise a component's properties and how to install that component onto a computer. OSD could be used to download a complete software package, but it's primarily designed for incremental updating. OSD works alone or with CDF to define application channels.
OSD was submitted to the W3C in August of 1997 by a group of vendors led by Microsoft and Marimba. What about e-commerce and XML?
For four years, CommerceNet, the 500-
Content definition: CommerceNet is working to define data elements common to a variety of commerce transactions. This so-called Commerce Core would define how to tag things like company name and address, price, item, and quantity.
Information exchange: Open, text-based XML is ideal for exchanging transaction information from one server to another. CommerceNet proposes using the XML-
One such CBL application is the Product Information Exchange (PIX) specification for catalog interoperability. CommerceNet designed PIX to help manufacturers and their distributors exchange product data more easily.
The long-term goal is for industry groups--not CommerceNet--to use CBL as a common basis for specific DTDs. Several industry-
Open Buying on the Internet (OBI): A standard for international business-
Open Trading Protocol (OTP): A consistent, interoperable environment for selling to consumers on the Web. Rules will range from how to offer items for sale to payment choices to product delivery, receipts, and problem resolution. OTP is backed by MasterCard International, DigiCash, CyberCash, Hewlett-
Internet Content and Exchange (ICE): Vignette and a number of other companies--including Microsoft--are developing a specification called ICE to enable the site-
Because XML separates content from presentation, Web builders need a new way to control design, display, and output issues. Style sheets are the answer. Currently, there are three types of style sheets that are candidates for use with XML:
- Cascading Style Sheets (CSS)
- Extensible Style Language (XSL)
- Document Style Semantics and Specification Language (DSSSL).
That leaves Extensible Stylesheet Language (XSL), the style-
XSL is more powerful than CSS because XSL lets Web builders create documents that can alter their own appearance dynamically. You could, for example, include a programming-
XML hyperlinking goes beyond basic HTML-style hyperlinking with a number of new features, including the ability to create "smart" links without a lot of hand-coded JavaScript. And in XML, links become objects in their own right and can thus be managed like any other objects.
The original linking specification--XLL, or XML Linking Language--is being split into two separate specs: XPointer and XLink.
XPointer: In HTML it's possible to link to the middle of a page only if the author of that page put an anchor tag there. With XPointer you'll be able to "address to" (not "link to") any part of someone else's text. It's easy to see how this ability would be useful in working with legal documents, scientific and academic papers, even W3C specifications!
XLink: When a user clicks an HTML hyperlink, the current Web page is replaced by the file being linked to. XLink lets Web builders add behaviors to links. Today, for example, you have to use a bit of JavaScript to make a link pop up a separate window, but XLink lets Web builders code links to perform a variety of actions, including popping up a menu of linking choices.
Another application of this technique might be to pop up a dialog box, perhaps an alert reminding users that they're about to update a database record. The link pop-up might require users to click a box to signify that they accept liability before proceeding. Today, this feature would take a boatload of scripting.
XML also lets Web builders create Extended Links that work sort of like a Web ring, which is a self-
There are additional issues still left to work out, especially in the area of behavior policies. There has to be a way to negotiate between the behavior a document's author recommends for a link, a user's preferences in regard to displaying link information, and policies as to if and when the user's desires should be overridden. Does XML have a place on servers?
XML is designed to be a repository format for long-
Server-software vendors are already supporting XML:
Enigma, Insight 4.0
This is a professional electronic-publishing software solution for publishers of large documents. Enigma's SGML/XML Style Sheet Editor, currently bundled with Insight, is also available as a standalone product.
Hynet Technologies, Digital Library System
The Digital Library System (DLS) manages documents and document components as standard software objects, allowing import of documents created in Adobe FrameMaker and Microsoft Word, or SGML/XML files.
Inso, DynaText Professional Publishing System
This indexing, searching, and scripting software is available for Microsoft Internet Information Server and Netscape Enterprise and FastTrack servers running on Windows NT 3.51 or 4.0 or on Sun Solaris 2.5.
Open Market, Folio
Open Market's Folio 4 information management and distribution products already import XML documents into an indexed database for content delivery over IP networks or to CD-ROMs. In January Open Market announced increased XML support to allow documents to be indexed and made secure in their native formats. Also, Folio products will interoperate with other standards-
WebMethods, Web Automation Server
Web Automation Server helps companies integrate browser-based applications and data with other applications.This XML-
All Web builders need to know enough about XML to decide whether or not they should use it. E-commerce sites and sites that manage large numbers of documents stored in databases are obvious initial candidates. Managers who might not need to learn XML syntax or how to create a DTD will still want to understand XML's potential in order to make use of it.
HTML is still more than adequate for marking up information if the ultimate goal is simply for it to be read by a human being. But if you want to prepare for automatic processing of data, you should think about incorporating XML into your publishing systems.
Not every HTML producer working on every Web site has to become an XML producer, but someone on the staff of every company should become proficient--especially if the site works with data and documents worth managing for future use.
Of course, XML's power also means complexity: some Web builders have found that while they can grasp the basics of HTML in a few days, they may have to spend a few weeks becoming comfortable with XML. Only you can decide if it's worth the effort. What XML authoring tools can I use?
Fortunately, Web builders won't be left on their own to create XML from scratch. Tools for creating, managing, and delivering XML are already on the market or in development by a number of companies.
Adobe: In mid-1998, Adobe will introduce interim versions of FrameMaker and FrameMaker+SGML that can export to XML. The next full release of these products will be able to import XML. Adobe has a representative on the W3C's XML working group and is also involved with XLink, Cascading Style Sheets, and RDF, so it makes sense to expect these technologies to appear in future Adobe products.
Allaire: HomeSite 4.0 and Cold Fusion 4.0, both expected this summer, will support XML, including style sheets. A CDF add-on is already available for HomeSite 3.0.
DataChannel: A free, Java-based validating parser called DXP (DataChannel XML Parser; based on Norbert Mikula's well-
Inso: This company offers what it calls "the first integrated, end-
IntraNet Solutions: The next version of Intra.doc Management System, IntraNet Solution's Web-
Microsoft: Office 9.0, which Microsoft hopes to ship by the end of the year, will reportedly have XML support.
Microstar: ActiveSG/XML is a set of tools and techniques for design and deployment of XML/SGML transaction-based systems on the Internet. Microstar also offers the free Ælfred XML parser.
SoftQuad: HTML editor HotMetal Pro will soon offer Live Data Base Pages, an add-on that lets developers drag and drop HTML data into a database and have it returned as XML.
Vignette: StoryServer 3.2 delivers XML-
XPublish: XPublish is an XML publishing system for Web site development and management that permits a developer to author in XML or extend current HTML documents with XML constructs, then publish the site as HTML for access by any standard Web browser. A Cascading Style Sheets editor is included.
WebMethods: The company makes XML-
Of course, if XML becomes ubiquitous on the Web, you can expect nearly every type of Web-
XML will make it easier than ever before for Web builders to create truly international sites because, like Java, it's defined in Unicode (ISO 10646), an internationally accepted standard for depicting virtually all of the world's letters, glyphs, characters, and ideograms. Unicode includes the ASCII ISO Latin characters, as well as Japanese, Korean, Chinese, Hindi, Greek, and Arabic, among others. It even permits the mixing of character sets--for example, an XML document displayed in Japanese kanji could reference a German word with an umlaut.
Developers don't have to learn any special script for Unicode to be in effect in XML documents, which will be displayed in users' browsers using the appropriate character set. What's the future of XML?
With all the activity surrounding XML, it's difficult to predict where it will be in six months. Tim Bray, coauthor of the XML and XLL specifications, says, "We have produced a tool that's designed to be general purpose, and the broad range of people leaping on board is evidence that we've succeeded."
In the short term, XML will probably surface first in metadata applications such as RDF. The next big impact will come with the approval of the Document Object Model specification. Bray claims that "the combination of XML and the DOM is really the magic bullet that will bring the Web alive."
XML should also help jump-
Meanwhile, Netscape and Microsoft can be counted on to continue expanding XML browser support to include both valid and well-
XML is a complex subject with deep implications for all Web builders. If you want to learn more, here are some good places to look:
World Wide Web Consortium (W3C):
- The XML specification.
- A discussion related to the XML spec.
- The Extensible Style Language (XSL) W3C note.
- The Extensible Linking Language (XLL) specification.
- The Document Object Model specification.
- A mailing list for XML developers involved in W3C specification development.
- ArborText's XML links and resources.
- Microsoft's XML site offers demos of how XML would work in the context of a weather report and an auction.
- Textuality houses XML-spec coauthor Tim Bray's FAQ on XML and links to other resources.
- An XML FAQ maintained on behalf of the W3C's XML Special Interest Group by Peter Flynn, University College, Cork, Ireland.
- XML information page, part of the SGML/XML Web Page created by Robin Cover of the Summer Institute of Linguistics, Dallas.
- The Graphic Communications Association's XML Files
- SGML University gives one- to two-day courses in cities across the country and at major conferences.