String interning (WAS: SAX2/Java: Towards a final form)

Tyler Baker tyler at infinet.com
Tue Jan 18 06:32:34 GMT 2000


Assaf Arkin wrote:

> > This call to intern occurs once per name per document. A small price to pay I think.
>
> Try it in other environments then what you're using right now. Tell me
> if it works as advertised.

I really don't know what else to say other than you can look at the implementation of String.intern() and
the implementations I and others have offered and see that the cost of using String.intern() is a function
of how it is used and not what particular environment it is being used in. I really don't know what else to
say because I don't see the point in trying to prove common engineering sense here.

> > I disagree. The Java String intern table as of JDK 1.2 uses weak references to the actual references
> > of the interned String objects. As long as the load factor remains constant, a hash table does not
> > lose performance as it grows.
>
> You'll be surprised how many people are using JDK 1.1 and non-Sun JVMs.
> You'll be surprised at how many simple Java features break under
> pressure.

Well those people who use JDK 1.1 right now are generally not using it for new software which SAX2 will be
supporting. Even then, JDK 1.1.7 in the SUN VM (and possibly other VM's) use their own internal version of
weak references as well. Under JDK 1.1.7 the software I have written that uses String.intern() has never
had any memory problems.

> > If interning is not done at the right place and the right time, then there is no point in using it as
> > it would be an obvious performance problem that does exactly the opposite of its original intention.
> > If you do things properly, then you get good performance results. I don't think any application could
> > possibly be hurt by interning. I do see many applications being benefited from interning.
>
> The question is, do you do them (as a parser would) or do you force
> other people to do them? Does it work for them as well as it works for
> you?

When you support a spec or interface you are "forced" to support those interfaces. The question is whether
"forcing" developers who will be providing SAX emitter support to use interned strings for element and
attribute names will be considerably more effort than the benefits brought to developers who use the SAX
interface for interpreting XML documents. IMHO, the benefits of requiring names in XML to be presented to
the application as interned strings is far greater than the trivial changes emitter applications will need
to make to support interned strings.

> How often have you been upset by a Windows feature that works better for
> Microsoft products, or a Java feature that efficiently solves problems
> for Sun products, but not good enough for your code?

VM issues may be a major issue when it comes to graphics performance or doing floating point calculations,
but in this particular case, it is like trying to argue whether or not hashtables work better on one
platform or another.

> > I am in the same boat with regards to performance, but I think you might be a little confused with
> > how things are actually implemented and how they actually work with regards to String.intern(). Also,
> > remember that the String intern table is just a table of 4 byte object references and not a table of
> > actual String objects. If weak references are used for the String intern table (which they are on
> > most JVM's now), then you will never have any memory problems.
>
> I'm afraid to tell you but weak references suck performance wise. I've
> been successful at avoiding them. I've reimplemented Hashtable and
> ThreadLocal to get around the Sun implementation issues. I'm very
> performance savvy, and still String.intern does not cut it.

It depends on how you use WeakReferences. In my experience they have been a major boon to my development. I
think you may be confused with the WeakHashMap implementation and weak references in general. I don't know
of many people who use java.util.Hashtable much in performance sensitive code anyways, so again this
argument is irrelevant.

> Once again, I will repeat what I said before. All parsers use their own,
> way faster, interning mechanism. Adding another call to String.intern
> will slow them down.

It is a one time cost for each unique name in a document. Of course this may slow things down a fraction of
a millisecond during parsing, it provides potentially great performance benefits at the application level.
I am not sure why this even needs to be argued anymore as I have used interned Strings in my XML
applications for over two years and have never ever had any instance of String.intern() ever popping up in
my profiling.

I think at this point this discussion has come to a point of arguing for the sake of winning an argument
and not about the common sense reasons for SAX presenting names as Java interned strings. I really don't
see the rationale of the arguments against using interned strings as most of the claims supporting those
arguments just are not true or are just half-baked conjecture.

I guess David will need to either flip a coin or make a decision on his own as to what SAX2 will end up
doing. Either way, I don't think anyone will be in an uproar regardless of what is actually choses. I know
I won't.

Regards,

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list