SAX2: Namespace proposal
Miles Sabin
msabin at cromwellmedia.co.uk
Mon Dec 20 17:38:34 GMT 1999
David Megginson wrote,
> if I call java.lang.String.intern the first time I add a
> string to the intern table, then the cost is proportional to
> the number of entries in the table, not the number of
> accesses.
>
> Or, in plainer English, if the XML name "div" appears 20,000
> times in my document, and I have my own custom intern()
> routine that calls java.lang.String.intern only when a new
> name is added to the table, I will still incur the cost of
> java.lang.String.intern only once, not 20,000 times. The
> benefit is that my interned strings will still be == to those
> from java.lang.String.intern.
That's right.
But my question is: what's the benefit? Sure you have all the
Strings in your table intern'ed. But how do you exploit that?
>From what you've said before, I presume that you'll do a
lookup with a part of a char array as a key, and get a
corresponding intern'ed String back. Now you can compare that
String using == with another String (so long as that String is
known to be intern'ed too). Unless you're doing lots of ==
comparisons with the same String (I'm not quite sure why you
would be), then you won't gain unless the cost of a lookup +
the cost of == is less than the cost of String.equals().
Since you're using a hash table you'll have to examine each
character in the array portion once to compute the hash code,
then you'll have to examine them all over again (at least once,
possibly more than once, depending on your hash table
implementation) to do the final key comparison which resolves
hash collisions. Add to that all the other overhead involved
in a hash lookup, and I very much doubt that you're faster
than String.equals() ... which just has to examine each
character in the two Strings once (don't forget that the String
class has direct access to the internal representation char
arrays, and doesn't have to use charAt()). In the case of a
mismatch String.equals() might not have to do much work at all
... "foo".equals("bar") can return false after just comparing
the 'f' and the 'b'.
The only likely reason I can think of for wanting to do more
than one == comparison is a bad one. If you're doing things
like,
// elemChars is the array 'd', 'i', 'v'; len == 3
String interned = table.lookup(elemChars, 0, len);
if(interned == DIV)
// do div processing
else if(interned == SPAN)
// do span processing
else if
// ... etc ...
then you'd be better off taking a different approach altogether.
Instead of mapping to an intern'ed String you could map to a
handler (a Design Patterns Strategy),
ElementHandler handler = table.lookup(elemChars, 0, 3);
if(handler != null)
handler.doProcessing();
You can even eliminate the test against null if you can arrange
for your table implementation to return a do-nothing or error-
generating handler on a mismatch.
>From a design point of view this is a win, because you no
longer have to hard wire the list of elements to test for into a
long sequence of conditionals (which is, in effect, a great big
switch on type ... ugh). And with even a not so good JIT, this
is likely to be the best bet performance-wise.
Cheers,
Miles
--
Miles Sabin Cromwell Media
Internet Systems Architect 5/6 Glenthorne Mews
+44 (0)20 8817 4030 London, W6 0LJ, England
msabin at cromwellmedia.com http://www.cromwellmedia.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list