Slowness of JDK 1.1.x String.intern() [was Re: SAX, Java,and Namespaces ]

Tyler Baker tyler at infinet.com
Fri Feb 5 20:55:09 GMT 1999


Tim Bray wrote:

> At 10:12 AM 2/5/99 -0800, Jeff Greif wrote:
> >JDK 1.1.7 intern is native, but is slow because it first converts the
> >characters in the string
>
> Actually, the real reason that most XML parsers will *never* use
> built-in intern is because they probably have the name available in a
> character array, and can go look things up in the handcrafted
> table without String-i-fying it - thus skipping several steps
> of work that a built-in intern is going to have to do.  E.g. Lark's
> symbol table is a double array, storing both the character-array
> and String version of each name - you lookup based on the
> character array and return the string if it's already there.  The
> point is that you call new String() only once per unique name.

I do pretty much the exact same thing.except on each call to new String()
I do something of the form:

new String().intern().

This way at the application level that for element names and attribute
names you can test for identity instead of equality.  Since you can't
exactly do something like this in any programming language I know of:

String s = new String("foo");
switch (s) {
  case "foo":
  case "bar":
}

You need to write code like this:

if (s.equals("foo")) {

}
else if (s.equals("bar)) {

}
etc.

In cases where the most likely scenario is testing for equality of a lot
of strings and then executing a default action as in the case of an else
statement, this can get expensive.  Even though calling String.intern()
has a one time cost for the first occurrence of an element or attribute
name, repeatedly calling String.equals() can be quite expensive too.

Code of the form:

if (s == "foo")
else if (s == "bar")

is about as fast as an integer compare and even though you may take a
small performance hit at the parser level (or DOM level) in the general
case you will be improving things at the application level even if you use
String.equals() since the String.equals() method is of the form:

public boolean equals(Object o) {
  if (this == o) {
    return true;
  }

  // Do other string comparing code
}

Nevertheless, the String.intern() method has a poor implementation under
the hood.  I don't know what kind of table the JDK is using under the hood
for each JVM, but whatever implementation SUN is using is pretty lame.
But despite the poor implementation of String.intern(), it is still a win
at the application level to be dealing with Names that are represented as
interned strings.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list