Thread-safe SAX parsing (was Re: Java based XML parsers for server use.)

David Brownell david-b at pacbell.net
Mon May 10 02:44:40 BST 1999


David Megginson wrote:
> 
>  > * mads at pacbell.net
>  > |
>  > | Are there any Java based XML parsers that are suitable for server
>  > | side use? One of the primary requirements for such a parser is that
>  > | should be thread safe. i.e.., more than one thread should be able to
>  > | use the parser simultaneously.

By the way "conventional wisdom" for multithreaded programming is
that sharing data structures is to be avoided as a default assumption.
Synchronizing kills performance, and confuses most programmers; so
there's little win in sharing data when it's not absolutely required.

That is, the conventional way to be "Thread Safe" is _not_ to have
threads sharing a parser.  Instead, each thread would have its own
parser object -- which makes sense, parsers are quite cheap and you
want to be able to have multiple threads _parsing concurrently_ (vs.
just parsing sequentially).

Another way to put this:  the parser "class" is reentrant, but any
given parser instance would typically be used by only one thread at
a time.


>
>	[ example deleted ]
>
> ====================8<====================8<====================
> 
> Although I am far from an expert in concurrent programming, my feeble
> brain thinks that this should work provided that the following
> conditions hold true:
> 
> 1. the implementor of the original parser doesn't use static variables
>    to store parse information -- in other words, the parser is
>    reentrant (they all should be: anyone competent enough to write an
>    XML parser in the first place would probably write a reentrant
>    one); and

Right, essentially every class should be reentrant unless explicitly
documented otherwise.

However, I certainly wouldn't expect a given _parser object_ to be
reentrant from the application point of view.  If a callback decides
to parse a new document, it should do so using some other parser rather
than expect the parser to keep track of two concurrent parses from the
same thread!


> 2. no two instantiations of the parser use the same Reader or
>    InputStream instantiation (that would be dumb anyway).

That doesn't quite do it.  Consider what happens when two threads
try to use the same parser object concurrently (contrary to my
advice above, that they should have their own parser object):

       THREAD 1                               THREAD 2

   parser.setDocumentHandler (H1);
                                     parser.setDocumentHandler (H2);
   parser.parse (doc1);
                                     parser.parse (doc2);

What happens there is that handler H2 gets two streams of parse
events, and handler H1 gets none.  Depending on how things get
scheduled, either Doc1 or Doc2 could get parsed first.



> Once the parse begins, the SAX parser itself controls flow of
> processing, so I cannot imagine how another thread could mess up the
> actual parsing (there's no way to change the parser's state externally
> once it begins parsing).  To be perfectly safe, you could always
> synchronize on the InputSource and/or the InputStream/Reader that you
> are using.
> 
> Perhaps people with more experience in Java concurrent programming can
> take (friendly) shots at this suggestion.

See above ... :-)

If you want two threads to share a parser, you'll have to agree on
rules for how they do so.  While that's something I'd discourage,
you could do it with any SAX parser (no "safe" wrapper necessary)
like this:

	// the ONLY use of the parser that each thread makes
	// would be inside this synchronized() block !!
	synchronized (parser) {
		parser.setLocale (locale);
		parser.setDocumentHandler (handler);
		...
		parser.parse (input);
	}

There are parsers out there which don't work correctly like that,
since they don't parse more than one document (bug!!) but if that's
really the desired mode of operation, and you've got a parser which
truly conforms to the SAX specificaiton, that's how to do it.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list