String interning (WAS: SAX2/Java: Towards a final form)

Tyler Baker tyler at infinet.com
Tue Jan 18 00:19:20 GMT 2000


Assaf Arkin wrote:

> Let me repeat what I said.
>
> 1. SAX 2.0 adds interfaces, but most of the stuff will still happen
> through the SAX 1.0 interfaces. Expect applications to support SAX 2.0
> by simply supporting SAX 1.0.
>
> 2. Parsers use interning which is optimized but does not match
> String.intern. Forcing them to use String.intern will slow them down.

As David has pointed out, the change is one line of code from:

public String getInternedString(char[] characters, int offset, int length) {
  String foo = stringtable.lookup(characters, offset, length);
  if (foo == null) {
    String foo = new String(characters, offset, length);
    stringtable.cache(foo);
  }
}

to:

public String getInternedString(char[] characters, int offset, int length) {
  String foo = stringtable.lookup(characters, offset, length);
  if (foo == null) {
    String foo = new String(characters, offset, length).intern();
    stringtable.cache(foo);
  }
}

This call to intern occurs once per name per document. A small price to pay I think.

> 3. String.intern is more efficient for some documents, it is less
> efficient for others, and it can kill your JVM once the interning table
> grows too large if your application is a server that is expected to work
> continueously.

I disagree. The Java String intern table as of JDK 1.2 uses weak references to the actual references
of the interned String objects. As long as the load factor remains constant, a hash table does not
lose performance as it grows.

> To conclude, the performance gain is true for some applications, it is
> imaginative at best for others. Those applications that can benefit from
> interning should find a way to use it. Those applications that will be
> hurt by interning, should not start breaking tommorrow.

If interning is not done at the right place and the right time, then there is no point in using it as
it would be an obvious performance problem that does exactly the opposite of its original intention.
If you do things properly, then you get good performance results. I don't think any application could
possibly be hurt by interning. I do see many applications being benefited from interning.

> Performance is paramout to what I do, I develop specifically sever-side
> software that performs the same repeated operations over and over on
> behalf of multiple clients. And yet I advise against it, and will
> certainly not use any code that poses the risk of overbloating with
> interning table.

I am in the same boat with regards to performance, but I think you might be a little confused with
how things are actually implemented and how they actually work with regards to String.intern(). Also,
remember that the String intern table is just a table of 4 byte object references and not a table of
actual String objects. If weak references are used for the String intern table (which they are on
most JVM's now), then you will never have any memory problems.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list