String interning (WAS: SAX2/Java: Towards a final form)

David Megginson david at
Wed Jan 12 19:08:43 GMT 2000

Miles Sabin <msabin at> writes:

> David Megginson has mentioned a way of reducing the overhead of
> java-interning: here we have a parser-internal map from 
> character sequences onto java-interned Strings ... 

This isn't just for reducing the cost of java interning: every
Java-based parser that actually has any kind of performance already
does this: otherwise, it could end up allocating tens of thousands of
new strings for even a medium-sized document.


> Whilst this might improve things a bit, it's still a 
> performance hit: if the parser internal map is shared between 
> parsers then we have the same contention problem back again 

I haven't seen a parser that does it this way.

> (tho' this time in application code rather than the JVM); if it 
> isn't (and hence is parser-/thread-local), then it has to be 
> repopulated at least for each new parser instance, probably for 
> each new document. 

Once for every parser instance is usually sufficient.

> Even tho' this only requires one java-intern 
> for each distinct name it still provides plenty of 
> opportunities for synchronization collisions.

When you consider that even a long-ish document instance (say, 20,000
elements with an average of 3 attributes each) will likely contain
fewer than 50 unique element and attribute names (often fewer than
25), there are going to be too few invocations of
java.lang.String.intern to cause any serious problems.

All the best,


David Megginson                 david at

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.

More information about the Xml-dev mailing list