SAX2 Namespace Support

Thu Dec 30 13:43:34 GMT 1999

David Brownell writes:

 > There's been way too much email on this topic -- I should have
 > weighed in earlier.  In all honesty I'd prefer to see all namespace
 > support be cleanly layered on top of SAX1.  It's easy to do it that
 > way; just add some optional code to postprocess a SAX event stream.

The argument against that is efficiency: I have found that even the
most efficient Namespace post-processor that I can write adds about
25% to parsing time.  The reason, I think, is that there is a high
cost to iterating through every attribute list and examining every
attribute name, and copying or wrapping the attribute lists to give a
Namespace view.  

If Namespace processing is done in the parser itself, on the other
hand, the overhead should be relatively close to 0.  Since most new
XML-related standards require Namespace processing, this is an obvious
place to optimize by allowing the parser to pass on information
directly.

 > With respect to this particular proposal, I have several comments.
 > 
 > First, it's unclear to me what's happened to our old friend, the
 > org.xml.sax.DocumentHandler.startElement callback:
 > 
 >     public void startElement (String name, AttributeList attrs)
 >     throws SAXException;
 > 
 > If that call is gone, I anticipate migration problems to SAX2.

There have been so many proposals that I'm starting to lose track.
The idea, I think, is that this would be replaced by

  public void startElement (String namespaceURI, String localName,
                            String prefixedName, String atts)

or by

  public void startElement (String namespaceURI, String localName,
                            String atts)

with an option to leave the prefix on the local name if the parser
supports it.

 > If it's still there, then it must be the application's choice to use
 > the new sax2.DocumentHandler interface or the original ... presumably
 > it would use Configurable.setProperty() with some ID for the new
 > namespace-aware sax2.DocumehtHandler to identiy its choice.

One option that no one has suggested yet is to create the
NamespaceHandler a little differently:

  public class NamespaceHandler
  {
    public void startElement (String namespaceURI, String localName,
                              NSAttributeList atts)
      throws Whatever;

    public void endElement (String namespaceURI, String localName)
      throws Whatever;

    // and the original NS decl events as well...
  }

That way, SAX parsers could still use the original DocumentHandler to
report the XML 1.0 view (with prefixed names), and the
NamespaceHandler to report the Namespace view of elements and
attributes, which is the only place the view differs.

We would simply make a rule that, with Namespace support, the NS
startElement event always comes just before (or just after?) the SAX1
event, and that the attributes in the two lists must be in the same
order.

Personally, I find this approach a little brittle: I don't like
depending on ordering like that (and, of course, having to allow for
NS decl attributes), and I don't like the fact that the app might have 
to copy either or both of the attribute lists before using them.
Still, I'm surprised that this suggestion hasn't come up.

 > Second, it's unclear how to report violations of namespace conformance.
 > 
 > I'd asked that the namespace spec resolve this issue, by using the
 > same reporting terminology that the XML spec uses ("warning",
 > "error", and of course "fatal error"), but instead it got even more
 > vague.  So I'll have to ask how SAX will address this ... keeping
 > in mind that if W3C gets around to answering those questions, it
 > might pick different answers.

Don't hold your breath -- last I was involved, the W3C groups were
swamped.

 > That is, faced with this document
 > 
 > 	<?xml version="1.0"?>
 > 	<html:p>Hello again! :-)</html:p>
 > 	<?at-end-of-document?>
 > 
 > Two reporting issues arise:  (a) How does one know that namespaces are
 > to be used at all?  It's a legal XML 1.0 document, so inherently there
 > is no error.  

That's a big problem.  My SAX2 proposal is for XML+Namespaces by
default, but it's possible to try to disable Namespace support.  That
means that, by default, you would get an error for this document.

 > (b) If one knows that namespaces are to be used, is the undeclared
 > "html" prefix to generate a warning, recoverable error, or fatal
 > error through sax.ErrorHandler?  Is it reported some other way?

I think that it would be wrong to use fatalError to report Namespace
violations, but others may disagree.  I think that OASIS or some other
body should take a stab at this problem -- we shouldn't wait for the
W3C to solve everything.  I enjoyed my time with the W3C XML Activity,
but I'd like to think that XML will outlive the organization that
specified it.

 > I think that using ErrorHandler.error() is the best solution, but then
 > that leads to the issue of how to report namespace URIs that aren't
 > available.  (And as I recall, there were more errors to deal with than
 > just unresolved namespace prefixes.)

Error numbers would be helpful, if someone were willing to invent some.

 > > This would never be enabled by default, but for the relatively small
 > > class of apps that needed to know the original prefix, the prefix
 > > would be available simply by splitting the name argument.
 > 
 > Clearly that class includes "DOM-using applications", which for better
 > or worse (opinions do vary :-) isn't a small class.
 >
 > DOM L2 applications explicitly have the same option that I noted above:
 > use (or non-use) of namespace information is the choice of the application,
 > not the choice of some version of an XML infrastructure.

Is DOM2 more explicit about processing than DOM1, then?  There's
nothing in DOM1 that says (for example) that you have to include
comments and other stuff from the original XML document, if in fact
there is an original XML document.

Even in DOM2, I wonder if you'd have to have the *original* prefixes
or just some prefixes?  After all, the DOM won't always be built from
an XML document; it might be a wrapper around a bunch of DB tables
(for example) where there are no original prefixes available.

 > > I like this approach because it doesn't throw the prefix in the face
 > > of apps that don't need it -- to paraphrase Larry Wall, it makes common
 > > tasks easy and uncommon tasks possible.
 > 
 > A third issue:  building a DOM is quite "common" though, and it needs
 > those prefixes.

I'll have to read the latest DOM2 before I comment in detail, but in
general, I'm not convinced that you have to include everything in a
DOM2 tree that DOM2 happens to support.

All the best,

David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)