SAX2: Interning names in namespaces

Fri Feb 4 03:47:12 GMT 2000

I submit the following for consideration in the design of the namespace
support for SAX2 and the implementation of name internalization in parsers:

BACKGROUND

Earlier on this list hosted a discussion about whether a SAX parser should
intern strings such as names. Interning strings in the parser provides
benefits such as less memory use and more efficient comparisions of names
during processing and filtering. Interning strings seem to be commonplace
among XML parsers.

There has also been a discussion about how to encode names with an
associated namespace, typically by merging the two into a string.
Suggestions ranged from placing brackets around the namespace URI to
separating the URI from the actual name with a space.

The current SAX2 beta uses multiple strings to represent a name in its
namespace. This has caused several of the methods in the recent beta of
SAX2 to have a rather cluttered argument list compared to SAX1.

PROPOSAL

I suggest that parsers interns each name as a separate string in the
namespace it belong to. This should guarantee that two equal names in the
same namespace are identical, and that two equal names in different
namespaces are unidentical.

IMPLEMENTATION EXAMPLE

For each namespace, the parser may use a weak hashtable indexed on
equality. When reading a name, the parser looks up the string in the
appropriate namespace hashtable, adding a *copy* of the name if there are
none in the table. The result is returned as a representation of the name.

DISCUSSION

The implementation results in that each namespace has its own copy of the
name. As a consequence, one can test whether two equal names are in the
same namespace by checking for identity, allowing very efficient
processing. Names that are the same but in different namespaces will be
equal, but not identical.

ADDENDUM

A parser can optionally associate a namespace with a name by using a hash
table indexed on *identity*. When interning a new name, the parser adds a
new item to the hash table using the internalized name as key and the
namespace as value. Looking up a name in the table returns the namespace of
the name. Names in different namespaces can be equal yet return different
namespaces when lokking them up in the table.

CONSEQUENCES FOR SAX

Parts of the namespace handling in SAX can be simplified if it can assume
that the parsers interns each name in its own namespace. It can eliminate
the need for passing namespace information as a separate argument to
methods or encoded in the name string. As a result, SAX2 might potentially
even keep some of the SAX1 argument lists for several functions instead of
adding new arguments to accomodate namespaces. This may allow namespace
support while maintaining compatability with SAX1.

FORWARD COMPATABILITY

The suggested representation of names is forward compatible in that it
allows new features to be associated with a name without breaking with an
earlier implementation. It also allows alternative representations of
names. A future version of SAX may want to introduce an interface for
names, so that one can extract the namespace URI and the localname in a
standard way. Future implementations may choose to build on this interface
to use a class for names rather than a string. A name object can be
interned in the hash table in the reference implementation in the same way
as a namestring. Equality and identity testing will still hold with objects
representing names rather than strings.

-- Terje <terje at in-progress.com> | Media Design in*Progress

   Software for Mac Web Professionals at <http://www.in-progress.com>
   Take advantage of XML with Emile, the first XML editor for Mac!

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions and unsubscriptions
are  now ***CLOSED*** in preparation for list transfer to OASIS.