SAX2 RFD: LexicalHandler draft v.1.1

Lars Marius Garshol larsga at ifi.uio.no
Thu Mar 25 10:01:51 GMT 1999


* David Megginson
| 
|     public abstract void startCDATA ()
| 	throws SAXException;
| 
|     public abstract void endCDATA ()
| 	throws SAXException;

This implies that the parser reports the contents of CDATA sections as
separate DocumentHandler.characters events, which is of course the
most natural way to implement things anyway.

However, the 1999-03-12 list of core features contains this:

  http://xml.org/sax/features/normalize-text
    Ensure that all consecutive text is returned in a single callback to
    DocumentHandler.characters or DocumentHandler.ignorableWhitespace
    (true) or explicitly do not require it (false).


This is potentially problematic, since it's unspecified what the
parser should do about CDATA sections in this case. (I suspect we will
see more problems of this kind when we start using really using and
stacking filters.) Should they be normalized, or should they be
reported separately? (Ie: what is consecutive text, exactly?) The same
problem appears with entity boundaries and character references.

I assume most users of normalize-text will want consecutive text to be
interpreted in the logical view of the document, rather than the
lexical view. Otherwise the DocumentHandler will receive different
events in these two cases:

  <desc>
  A problematic case.
  </desc>

and

  <desc>
  A <![CDATA[problematic]]> case.
  </desc>

which is rather fragile, and this behaviour should be avoided, IMHO.


So basically the problem is that normalize-text and LexicalHandler
don't go well together. You can have one, but not both at the same
time, unless the driver changes it's behaviour. In other words, this
seems to require the driver to have explicit knowledge about
normalize-text.

Possible solutions:

 - reject normalize-text true if a LexicalHandler has been registered,
 and reject LexicalHandler registration if normalize-text has been set
 to true
 - make normalize-text have a logical interpretation by default, and
 switch to lexical if a LexicalHandler has been registered
 - make normalize-text always have a lexical interpretation
 - have separate normalize-text-logical and normalize-text-lexical
 events, with reject-behaviour for the first

Thoughts?

--Lars M.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list