Streaming XML and SAX

Tom Harding tomh at thinlink.com
Mon Mar 1 05:41:36 GMT 1999


Marcelo Cantos wrote:

> It has already been pointed out in this discussion that some
> environments try to increase the throughput by dispatching documents
> off to different threads.  A system with 50 CPU's is going to be
> operating as low as 2% capacity if it is forced to pipe the entire
> parsing load through a single thread.  I don't see how you can argue
> that this is efficient.

Even if you believe that parsing to convert markup into memory structures is slower than
back-end processing, if parsing is faster than the stream itself there is no difference in the
two approaches.  Anyway, in the general case the question is moot because there may be
inter-document dependencies, so you have to look inside the document before trying to
parallelize.

The whole point of this discussion was whether the document terminator ought to be XML or
non-XML.  Aside from the fact that I haven't yet seen a workable suggestion for a non-XML
terminator, it isn't necessary to completely examine a document or convert it to a tree just
to find an XML terminator.   As Nathan pointed out, you could write a semi-parser to find
terminators and then actually parse documents in parallel, but you'd need to suggest a way for
dealing with inter-document dependencies.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list