SAX: String Internalisation and a CORBA/DCOM Question

James Clark jjc at jclark.com
Sun Apr 19 07:45:52 BST 1998


David Megginson wrote:
> 
> Here's another last-minute SAX question: should org.xml.sax.Parser
> expose a method for internalising strings?
> 
>   public abstract String intern (String s);

Absolutely not.

> Most Java-based parsers, at least, already use some type of
> internalisation (but not, usually, the inefficient
> java.lang.String.intern() method) for names -- the SAX driver could
> expose this functionality if support is already there, or do its own
> internalising if support is absent.

That would be a significant performance hit on SAX use with parsers that
don't do internalisation.  XP does not do this sort of internalisation
because it would make it slower.

> As someone has already pointed out, internalised strings will make a
> dramatic difference for the speed of applications, since applications
> can use a simple '==' operator (or the local equivalent) to test for
> equality rather than a slow subroutine like java.lang.String.equals().

Doing lots of comparisions on the type of each element whether using
equals or == is not a good way to write an efficient application.  It's
typically better to have a hash-table that associates each element type
with either an integer (which you can then use in a switch statement) or
an object (which you then make a method call on).

This could be done a little more efficiently with help from the parser. 
For example, you could have a method on SAXParser

  setElementTypeUserData(String elementType, Object userData);

Then startElement() and endElement() in SAXDocumentHandler could have an
additional Object userData argument.

This would allow apps to do something like:

void startElement(String name, Object userData, SAXAttributeList atts) {
  switch (((Integer)userData).intValue()) {
  ...
  }
}

or

void startElement(String name, Object userData, SAXAttributeList atts) {
  ((ElementHandler)userData).start();
}

I don't think it's worth the complexity.

> By the way, here's the minimum list of what should be internalised in
> the callbacks from the SAX parser:

SAX should not require the internalization of anything.

James



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list