Thread-safe SAX parsing (was Re: Java based XML parsers for server use.)

roddey at us.ibm.com roddey at us.ibm.com
Mon May 10 18:44:59 BST 1999




> > * mads at pacbell.net
> > |
> > | Are there any Java based XML parsers that are suitable for server
> > | side use? One of the primary requirements for such a parser is that
> > | should be thread safe. i.e.., more than one thread should be able to
> > | use the parser simultaneously.
> >
> > I have to confess that I haven't looked at this aspect of parsers
>
> [snip]
>
>You could start with this:
>
  [snip]


>Once the parse begins, the SAX parser itself controls flow of
>processing, so I cannot imagine how another thread could mess up the
>actual parsing (there's no way to change the parser's state externally
>once it begins parsing).  To be perfectly safe, you could always
>synchronize on the InputSource and/or the InputStream/Reader that you
>are using.
>
>Perhaps people with more experience in Java concurrent programming can
>take (friendly) shots at this suggestion.
>

We take the approach that thread safety, at least from the perspective of
parsing, should be at the actual parser/scanner instance level. So, you cannot
have two threads using the same parser, but you can have as many threads as you
want, each with its own parser. This makes the system almost totally
synchronization free, so its fast and cheap. There are a few static data bits,
but they are generally read only once initialized, so there is really little to
no synchronization required.

But that's just for parsing. The events that come out of the scanner/parser are
in the context of that one thread that is running that parse, so that is no
problem as well as long as the recipient handles any synchronization on the
target data structures. DOM of course is a another specification of worms and
has its own issues. It would often be the case that multiple threads would be in
a DOM structure all at the same time, so it generally has to be overly
synchronized and more granular level (i.e. it costs whether you are using it or
not.)

>From a server's perspective though, the 'thread per parser' scenario seems
optimal. It would basically equate to a 'thread per served client', which also
is very common and useful. In that kind of situation, there would be very little
synchronization required as relates to the XML parser. We don't forsee any
useful reason, that's worth the gotchas it raises for everyone else, to have
multiple threads in a single parser at a time.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list