When you need to process XML documents, you must first decide whether to use DOM
(Document Object Model) or SAX (Simple API for XML), the two main XML APIs in
use today. You can use either (or both at the same time) to process XML
documents, but DOM loads the document into memory to process it. SAX, on the
other hand, can examine an incoming XML stream so that not all of the XML code
need reside in memory simultaneously.
You choose between DOM and SAX in
much the same way that you might choose between tables or views in a database:
Select the approach that suits the situation. If you want to simply explore an
XML document and not manipulate it, then choose SAX.The differences between SAX and DOM
There are a number
of key distinctions between SAX and DOM, including:
- DOM is preferred for complicated jobs, such as when the XML schema is
inherently intricate or when you need random access to the data in the document.
SAX moves in a linear fashion from the start of the document down through each
node to locate a particular node or otherwise provide information about the
- DOM builds a type description for every node in the XML document it loads
into memory. Collectively, these descriptions result in an easily traversable,
though potentially huge, tree structure. If the XML is verbose, DOM represents
runaway inflation. For example, a 300-KB XML document can result in a
3,000,000-KB DOM tree structure in RAM or virtual memory. By contrast, a SAX
document is not deconstructed at all, nor is it cached in memory (though, of
course, parts of it reside briefly in memory buffers as the XML stream is read
through). SAX is a “lighter” technology—imposing little burden on your system.
SAX is the equivalent of watching a marathon go by; DOM is like inviting all the
racers home for dinner.
So which do you choose? If you're doing something complicated such as
advanced XSLT transformations or XPath filtering, choose DOM. You'd also pick
DOM if you're actually creating or modifying the XML documents.
On the other hand, choose SAX for searching or reading XML documents. SAX can
quickly scan a large XML document, then stop when it finds a match to your
search criterion and hand you the appropriate fragment from the document.
In some situations, the best choice is to employ both DOM and SAX for
different aspects of a single solution. For example, you might want to load XML
into memory and modify it with DOM, but then transmit the final result by
emitting a SAX stream from the DOM tree.Using the
If you’re interested in employing SAX, it’s free and
you can find considerable help at the SAX Project page
. You can also use SAX within Microsoft’s Visual Studio
. Visual Studio also
offers a more flexible alternative to the traditional SAX API. The XmlReader
class provides all the efficiencies and advantages of SAX, but it adds the
ability to easily customize the behaviors available in the class. Though both
SAX and XmlReader are forward-only, read-only systems, with XmlReader you can
skip forward if you want to. For example, you can employ the reader’s
MoveToContent and Skip methods to avoid having to slog serially through every
node in the document—notifying your code of the nodes as you go.
Another primary advantage of the XmlReader class is that it pulls each XML
node into your source code (rather than pushing it as SAX does). This allows you
to more effectively manage some kinds of data. For instance, with XmlReader,
it’s relatively straightforward to examine multiple input streams
To get an idea of how to use XmlReader, start a new Windows-style Visual
Basic .NET project in Visual Studio and add the following namespace references
at the top of the code window:Imports System.Xml
Now cut and paste the code from Listing A into the Form_Load event:
Note that the code actually instantiates an XmlTextReader object, which is
derived from the XmlReader abstract class.
Before trying to execute the code, you must substitute the path of an actual
.XML file on your hard drive (any .XML will do) for the "c:\books.xml" string in
this line of the code:Dim Xr = New
Once you've done that, you can press [F5] to execute your program, and you’ll
see that the XmlReader in this code has parsed the document and can report its
number of elements and attributes.
When you instantiate an XmlTextReader, you can simultaneously provide its
constructor with the target XML filepath in a string, as I did in this code.
However, this is a heavily overloaded constructor, so you can provide a variety
of arguments when instantiating an XmlTextReader: path, stream, another
XmlTextReader, XmlNameTable, XmlNodeType, XmlParserContext, and various
combinations of these objects.
In Listing B, both the schema and data are extracted from a string
Assume that the caller in this situation has read a node from a stream and
presents it to this XmlTextReader for analysis.