String interning (WAS: SAX2/Java: Towards a final form)

Tyler Baker tyler at infinet.com
Wed Jan 12 21:31:53 GMT 2000


Miles Sabin wrote:

> Tyler Baker wrote,
> > Miles Sabin wrote,
> > > [snip: table mapping to intern'd Strings]
> > > Even tho' this only requires one java-intern for each
> > > distinct name it still provides plenty of opportunities for
> > > synchronization collisions.
> >
> > Nope. Names in XML are highly redundant especially for
> > Namespace prefixes. Also, even if the number of calls to
> > String.intern() were significant (which they rarely if ever
> > are), modern Java runtimes have lowered synchronization
> > overhead to be small enough that you don't really have to
> > think about it much in terms of impacting performance
> > anymore.
>
> I think you're making two assumptions that don't always hold.
> Not all java xml applications are one shot, single doctype:
> some continuously parse multiple documents of a variety of
> doctypes in multiple threads. There's not necessarily _any_
> particular upper bound on the number of distinct element and
> attribute names that might be encountered. So there could be
> continual contention for the JVM's intern table.

Well of course there is never an upper bound for the number of distinct element names or
attribute names in a document, but in general you usually have exponentially more elements
and attributes than you do distinct element or attribute names. Trying to satisfy a
condition that will never happen in the real world of how XML will be used, is exactly the
same wrong mode of thinking that I think led to how "Namespaces in XML" came about. The
designers I feel tried to satisfy all of these hypothetical conditions, without ever
thinking about the real world implications. This is what you are doing here which is
laudable, but I don't think really has anything to do with real world use of XML.

> And I think you're assuming a single processor JVM. The
> synchronization overhead picture is *very* different on multi-
> processors.

Synchronization is synchronization. For most documents, making a call to String.intern()
50-100 times in a 100KB document is a lot less expensive than doing:

if (x.equals("foo") {

}
else if (x.equals("bar") {

}
etc...

As opposed to:

if (x == "foo") {

}
else if (x == "bar) {

}
etc.

Calling the equals method can get expensive for large case statements.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list