String interning (WAS: SAX2/Java: Towards a final form)

David Brownell david-b at pacbell.net
Sun Jan 16 03:06:25 GMT 2000


Tim Bray wrote:
> 
> At 12:13 PM 1/14/00 -0800, David Brownell wrote:
> >The reason not to mandate it is that there are non-parser
> >applictions of SAX, where it's unreasonable to demand that
> >the event source guarantee such interning.
> 
> Really?  Could you expand on that?

I was thinking of examples ... keep in mind that the consumer
of these events could be lots of things:  something that writes
out XML text over a socket, something building a DOM, something
that does XSLT transforms, etc.


FIRST example, a type that I think will be common, is one that
it's actually easy to demand that the interning happen.  Namely,
objects that know how to print themselves as XML, likely an element.

That element structure won't be dynamically determined, at least
in my playbook (Keep It Simple, Stupe!) so that names of elements
and attributes, and namespace URIs, will most naturally be string
constants and hence automagically interned.

Some people may use different playbooks, focussed on their favorite
generic framework for object<-->XML conversions, that may work the
other way around and find interning to be extra work.


SECOND example could reasonably be viewed as a kind of parser:
it's something that walks all or part of a DOM tree.  With such
trees there's no guarantee that interning is done on names or URIs.

Trees built by hand will _sometimes_ use literals (interned), trees
built using SAX parsers will often have interned strings (viz. many
previous discussion), but there are also ones built using other tools,
say databases with element names, for which the names/URIs wouldn't
normally be interned.  I've seen all three ways to build DOMs.


THIRD example is similar to a combination of the previous two:
someone uses a custom data structure, lighter weight than DOM but
general enough to handle all their data, and then uses that data
to regenerate a stream of SAX events (sent to socket, etc).

This one's intentionally a bit hand-wavey, since the goal is to
optimize for some problem to which XML (and SAX, and DOM) are very
much incidental.  Since the structures are task-optimized, it's
not certain that interning will be desirable.


Now in all of those cases one could define a postprocessor that
interns all the strings going through startElement()/PI()/... and
so on, but that can be a lot of extra work that may not be needed.
And extra work is always undesired.

Ergo my feeling that it's better to just expose whether the event
producer is doing the interning, than to require it always be done.

(phew!)


> Mind you, this debate on what is really a fairly minor piece of SAX
> is probably coming approaching a negative cost-benefit ratio.

You noticed too?  :-)

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list