String interning (WAS: SAX2/Java: Towards a final form)
Miles Sabin
msabin at cromwellmedia.co.uk
Wed Jan 12 18:04:37 GMT 2000
I think we need to clarify a couple of ambiguities here.
There are two sorts of interning being talked about wrt SAX2:
the standard java interning performed by String.intern(); and
other, parser specific, mechanisms for ensuring that Strings
which are String.equal() are also ==. I'll call the former
java-interning and the latter app-interning.
I have a strong objection to SAX requiring that Strings
returned from it's methods be java-interned. I'm not bothered
about requiring app-interning so long as the guarantees are
weakened a little.
First, the problem with java-interning. The way this is
implemented (in all the JVM's I've seen the sources of) is
via a hash-lookup of the pre-interned String in a JVM-internal
table. Because this table is shared by all threads in a JVM
this lookup has to be synchronized. The upshot is that there
is a huge potential for lock-contention where many threads
are interning simultaneously. This is bad enough on a single
processor machine, but could seriously clobber performance on
a multi-processor box. I, for one, want to use multiple SAX
parser instances driven from multiple threads on SMP machines,
and I'd be a tad distressed if java-interning were a SAX
requirement.
David Megginson has mentioned a way of reducing the overhead of
java-interning: here we have a parser-internal map from
character sequences onto java-interned Strings ... if when you
lookup on the char sequence you get non-null String back then
that's the java-interned result; otherwise you convert the char
sequence to a String, java-intern it and enter it in the table.
Whilst this might improve things a bit, it's still a
performance hit: if the parser internal map is shared between
parsers then we have the same contention problem back again
(tho' this time in application code rather than the JVM); if it
isn't (and hence is parser-/thread-local), then it has to be
repopulated at least for each new parser instance, probably for
each new document. Even tho' this only requires one java-intern
for each distinct name it still provides plenty of
opportunities for synchronization collisions.
App-interning could be fine tho' ... so long as it's defined in
such a way that it can be implemented in a completely thread-
local way. Doing that means we'd have to,
1. Weaken the guarantees on the equivalence of String.equals()
and ==.
To avoid synchronization issues we'd have to say that
app-interning is done relative to a given parser call,
ie. where foo and bar are both obtained via a callbacks from
the same call on XMLReader.parse()
foo.equals(bar) iff foo == bar
but if foo and bar are not both obtained via callbacks from
the same call on XMLReader.parse()
foo.equals(bar) does not imply foo == bar
2. Adopt something like Lars proposal of a StringInterner
interface.
We'd need this to allow a SAX client to app-intern any
literal Strings it wants to == test against in it's
handlers.
This should get us what most people want: fast equality
comparisons and shared representation within the implementation
of a ContentHandler, but without any need for synchronization.
One point to bear in mind: none of the foregoing would
_prevent_ a SAX implementor from using java-interning if they
wanted to.
Cheers,
Miles
--
Miles Sabin Cromwell Media
Internet Systems Architect 5/6 Glenthorne Mews
+44 (0)20 8817 4030 London, W6 0LJ, England
msabin at cromwellmedia.com http://www.cromwellmedia.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.
More information about the Xml-dev
mailing list