SAX2: Namespace proposal

Miles Sabin msabin at cromwellmedia.co.uk
Mon Dec 20 17:38:34 GMT 1999


David Megginson wrote,
> if I call java.lang.String.intern the first time I add a 
> string to the intern table, then the cost is proportional to 
> the number of entries in the table, not the number of 
> accesses.
>
> Or, in plainer English, if the XML name "div" appears 20,000 
> times in my document, and I have my own custom intern() 
> routine that calls java.lang.String.intern only when a new 
> name is added to the table, I will still incur the cost of 
> java.lang.String.intern only once, not 20,000 times.  The 
> benefit is that my interned strings will still be == to those 
> from java.lang.String.intern.

That's right.

But my question is: what's the benefit? Sure you have all the
Strings in your table intern'ed. But how do you exploit that?

>From what you've said before, I presume that you'll do a
lookup with a part of a char array as a key, and get a
corresponding intern'ed String back. Now you can compare that 
String using == with another String (so long as that String is 
known to be intern'ed too). Unless you're doing lots of == 
comparisons with the same String (I'm not quite sure why you 
would be), then you won't gain unless the cost of a lookup +
the cost of == is less than the cost of String.equals().

Since you're using a hash table you'll have to examine each
character in the array portion once to compute the hash code,
then you'll have to examine them all over again (at least once,
possibly more than once, depending on your hash table 
implementation) to do the final key comparison which resolves
hash collisions. Add to that all the other overhead involved
in a hash lookup, and I very much doubt that you're faster
than String.equals() ... which just has to examine each
character in the two Strings once (don't forget that the String 
class has direct access to the internal representation char 
arrays, and doesn't have to use charAt()). In the case of a 
mismatch String.equals() might not have to do much work at all 
... "foo".equals("bar") can return false after just comparing 
the 'f' and the 'b'.

The only likely reason I can think of for wanting to do more
than one == comparison is a bad one. If you're doing things 
like,

  // elemChars is the array 'd', 'i', 'v'; len == 3
  String interned = table.lookup(elemChars, 0, len);
  if(interned == DIV)
    // do div processing
  else if(interned == SPAN)
    // do span processing
  else if
    // ... etc ...

then you'd be better off taking a different approach altogether.
Instead of mapping to an intern'ed String you could map to a
handler (a Design Patterns Strategy),

  ElementHandler handler = table.lookup(elemChars, 0, 3);
  if(handler != null)
    handler.doProcessing();

You can even eliminate the test against null if you can arrange 
for your table implementation to return a do-nothing or error-
generating handler on a mismatch.

>From a design point of view this is a win, because you no
longer have to hard wire the list of elements to test for into a 
long sequence of conditionals (which is, in effect, a great big
switch on type ... ugh). And with even a not so good JIT, this 
is likely to be the best bet performance-wise.

Cheers,


Miles

-- 
Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin at cromwellmedia.com          http://www.cromwellmedia.com/


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list