SAX: String Internalisation and a CORBA/DCOM Question
James Clark
jjc at jclark.com
Sun Apr 19 07:45:52 BST 1998
David Megginson wrote:
>
> Here's another last-minute SAX question: should org.xml.sax.Parser
> expose a method for internalising strings?
>
> public abstract String intern (String s);
Absolutely not.
> Most Java-based parsers, at least, already use some type of
> internalisation (but not, usually, the inefficient
> java.lang.String.intern() method) for names -- the SAX driver could
> expose this functionality if support is already there, or do its own
> internalising if support is absent.
That would be a significant performance hit on SAX use with parsers that
don't do internalisation. XP does not do this sort of internalisation
because it would make it slower.
> As someone has already pointed out, internalised strings will make a
> dramatic difference for the speed of applications, since applications
> can use a simple '==' operator (or the local equivalent) to test for
> equality rather than a slow subroutine like java.lang.String.equals().
Doing lots of comparisions on the type of each element whether using
equals or == is not a good way to write an efficient application. It's
typically better to have a hash-table that associates each element type
with either an integer (which you can then use in a switch statement) or
an object (which you then make a method call on).
This could be done a little more efficiently with help from the parser.
For example, you could have a method on SAXParser
setElementTypeUserData(String elementType, Object userData);
Then startElement() and endElement() in SAXDocumentHandler could have an
additional Object userData argument.
This would allow apps to do something like:
void startElement(String name, Object userData, SAXAttributeList atts) {
switch (((Integer)userData).intValue()) {
...
}
}
or
void startElement(String name, Object userData, SAXAttributeList atts) {
((ElementHandler)userData).start();
}
I don't think it's worth the complexity.
> By the way, here's the minimum list of what should be internalised in
> the callbacks from the SAX parser:
SAX should not require the internalization of anything.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list