SAX and namespaces: an implementation

John Cowan cowan at locke.ccil.org
Fri Jul 24 22:40:50 BST 1998


I have developed a SAX driver that implements
namespace processing.  The current version does not do
validation of the namespace constraints (namespace PIs can
appear anywhere, unknown prefixes are left alone, etc. etc.)
The efficiency is not all it could be, as I have primarily
been concerned with proof of concept.  All the code shares
the SAX non-license.  Here are the details:

Three new classes are involved: org.xml.sax.ParserFilter,
org.xml.sax.helpers.PseudoAttributeList, and
org.xml.sax.helpers.NamespaceFilter.  (I want this to be
a standard part of the SAX package, but if David objects
I'll change the package names to org.ccil.cowan.sax.)

ParserFilter is a subinterface of Parser, which adds the method
"setParser(Parser parser)" to specify the underlying parser.
Semantically, ParserFilters look like Parsers but rely on some
other Parser to do the dirty work.  They can be chained.
(XAF is a ParserFilter in effect, and perhaps could be
modified to implement this interface.)

PseudoAttributeList is an implementation of AttributeList that
knows how to set itself up from the "data" portion of a PI.
I split it out because there probably will be other PIs which
are made to look like they contain attribute lists.  No
entity references are processed within the pseudo-attribute values;
processing character references is a reasonable enhancement.
(XAF does this too, but not in a distinguishable way.)

NamespaceFilter is a ParserFilter that does namespace processing.
To use it, create an instance of the real parser, create a
NamespaceFilter instance, and use setParser to link the two.
Then any SAX application which registers as a DocumentHandler with
the NamespaceFilter instance will receive element names, attribute
names, and PI targets mapped from the "prefix:local" form to the
form "URI + dagger (\u2020) + local".   Unknown prefixes are currently
left alone rather than reporting an error.

If you don't want to process this format, call "registerPrefix"
specifying a namespace URI and a prefix your application prefers,
and document prefixes will be mapped to application prefixes
instead of URIs.  (This works only if the document prefix has
been properly declared with a namespace declaration and an
exactly matching URI, of course.)  The colon delimiter is left in
place in that case, unless the application prefix is the null string.

A public method "universalName" allows you to invoke the mapping
mechanism yourself for attribute values or the like.  There are
also two utility (static) methods to split up a universal name
into its URI and local-part.

I think this pretty much does everything that people on the list
said they wanted (except namespace-rules validation), and without
burdening SAX parsers or requiring SAX applications to choose
between ns-aware parsers and ns-unaware ones.  Instead,
it is *applications* which are ns-aware or not, and handle their
needs by using NamespaceFilter or not.

Comments?

-- 
John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list