christopher.harris at reuters.com
Wed Jul 21 13:09:30 BST 1999
I came up against this issue in some work I've been doing recently on Java
XML parser performance, would be interested in comments etc.
There are two main APIs for applications that process XML documents - DOM
(Document Object Model) and SAX (Simple API to XML). DOM involves
constructing a tree of Nodes from the XML document, and then using a simple
API to walk the tree and extract information from it. SAX is an event based
API designed to fire user-written code as elements etc. are encountered in
the parsing of the document. SAX is very useful when the document may be
too large to fit into memory.
Validation of a document means ensuring that the document conforms entirely
to the DTD for that document. In practice, this means that the entire
document has to fit into memory since validation cannot be completed until
the whole document has been parsed. This fits well with the DOM model, but
is somewhat in opposition to the SAX philosophy where most of the action is
done during the document parse.
This is not to say that you cannot use a validating parser with SAX, in
fact all the major parsers provide such capability. It's quite possible to
execute all your business logic during the SAX parse, and then throw it
away if the parse fails at the end of the document.
My question is really, for those of you who are writing XML processors
(i.e. applications that use XML), what mode(s) do you use, and do you find
the need for a validating SAX parser?
It's interesting to note that, in IBM's xml4j parser, the SAX parser is by
default non-validating, and the DOM parser is by default validating.
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev