SAX2: Namespace proposal

Miles Sabin msabin at cromwellmedia.co.uk
Mon Dec 20 14:46:28 GMT 1999


David Megginson wrote,
> Stefan Haustein wrote,
> > - building a new object seems some overhead at the first 
> > sight, but in JAVA also a new String is a new object...
>
> And that is why most parsers internalize strings rather than 
> creating new ones,

This isn't necessarily the best approach. Intern'ing a string
involves a lookup in a JVM-internal hash table. This table is
shared across all threads, and consequently has to be locked
against simultaneous reads and updates. That means we've got
two potential sources of overhead: the lookup itself; and lock
contention between multiple threads trying to access the
table. The former probably isn't a big deal, but that latter
can make for a serious performance hit in heavily threaded
systems, especially on SMP machines. Unless you know there's
not going to be contention (eg., because you know you're
running single threaded) it's probably wisest *not* to intern.

It's also worth remembering that you've got to _already_ have
a String before you can intern it! If you've just created one
(eg. from a portion of a char array) then you're only going to
add overhead by doing an intern in addition.

The only possible benefits are,

1. If you've got a pair of Strings that are both *known* to be
   intern'ed you can use == for equality comparisons rather
   than equals. 'known' is the crucial qualifier here: in my
   experience it's most common that only one of a pair of
   Strings will be known to be interned, which means that
   before we can use == the other has to be intern'ed first ...
   which more than wipes out any speedup.

2. Intern'ed Strings share storage. I can imagine situations
   where this _might_ be significant, but they're likely to
   be edge cases. Unless you're actually hanging on to
   references to large numbers of equal Strings then garbage
   collection _should_ recycle the storage allocated to old
   ones. Some JVMs might have trouble doing this nicely, but
   then the best bet would be to get hold of a better JVM
   rather than tying to hack around the problem. Bear in mind
   that troublesome JVMs will also cause problems even with
   intern'ing ... because, as mentioned in (1), we'll have had
   to create a String before we can intern it, and typically
   the pre-intern String will be discarded: if gc is slack then
   these will pile up even tho' unreferenced.

> and that's why the SAX characters() and ignorableWhiteSpace() 
> methods use character arrays rather than strings.

This, on the other hand, can bring genuine gains, at the cost
of considerably uglifying the API.

Cheers,


Miles

-- 
Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin at cromwellmedia.com          http://www.cromwellmedia.com/


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list