String interning (WAS: SAX2/Java: Towards a final form)

Tyler Baker tyler at
Mon Jan 17 23:02:27 GMT 2000

Assaf Arkin wrote:

> Tyler,
> I am aware of how to perform interning. I wrote OpenXML which performs
> interning for SAX and DOM, and I'm a contributing member of XML Apache,
> so I'm also familiar with their mechanism.
> Yet, aside from parsers and DOMs I use SAX in a variety of applications
> that do not perform String interning, nor is there any benefit for them
> to do so. I'm afriad that mandating interning will simply break these
> (and many other) applications.

It won't break SAX 1.0 because it is not a mandated feature. For SAX 2.0 implementations, these
applications will need to support the SAX 2.0 API anyways. Having interned String support
regardless of the application is mostly trivial, but the benefits at the application level can
be immense if performance is at all a consideration in your applications. Really it depends on
the size of your document. For web browsers, interning or not interning is no big deal because
the documents are not that large anyways. I/O is pretty much always your bottleneck and not the
parser, even if the parser is very inefficient.

> Also, both OpenXML and Xerces use their internal interning mechanism
> which is substantially faster than String.intern, especially for dealing
> with DOM and parsing, however, the following will never work in either
> OpenXML or Xerces:
> if ( tagName == "foo" )
> for the simple reason that their interning mechanism and String.inter do
> not share the same table.
> arkin

The entire point of using String.intern() is to make the application which uses the parser
framework faster and not in a way which makes you have to write code like this:

public static final String CONSTANT = GlobalStringInternTable.intern("foo");

As a developer I prefer to use the least number of proprietary hooks as I possibly can. Using
some GlobalStringInternTable I think would only make sense for namespace support if you had a
parser framework that presented the application with a Name object instead of three strings
consisting of the prefix, namespace, and local part.

I guess it is just an argument mostly about what you want the application developer to deal
with. For me I prefer the way that gives me maximum performance without any obtuse coding to
some proprietary string table interface.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.

More information about the Xml-dev mailing list