david-b at pacbell.net
Mon Aug 16 19:05:08 BST 1999
David Megginson wrote:
> arkin writes:
> > A generic SAX parser has two methods of reporting character data, one
> > clearly indicates that such character data is whitespace. What type of
> > whitespace should be reported as whitespace? Can the application simply
> > ignore whatever character data is reported as whitespace?
> The only whitespace reported that way is whitespace in element-only
> content: that means that there has to be a DTD, and the DTD has to say
> that an element can contain only other elements. This is a reporting
> requirement for validating parsers from the XML 1.0 recommendation.
Hmm, the XML spec never quite seemed clear about that to me. It didn't
quite include a definition of the term "ignorable whitespace".
What about an empty element "<EMPTY> <!-- spaces!! --> </EMPTY>" ...
isn't that "ignorable" whitespace as well? It "must be" passed to the
app, and clearly isn't regular character text.
FWIW I concluded "ignorable" whitespace is within elements that have a
content model that's not "ANY" or a mixed content model. That is, it's
wherever normal characters can't appear.
> > The XML specification clearly indicates some guidelines for handling
> > white space in a consistent manner that saves the application developed
> > from dealing with it, and will solve all of our problems (maybe except
> > world hunger). Would it be reasonable to define two SAX parser layers,
> > one before and one after the white space stripping?
> You can use the same API for both, but any whitespace stripping must
> be strictly at the application's discretion.
Where "application" is a fuzzy notion: everything above the XML processor,
which could primarily consist of library code that doesn't want to give
such options to its callers.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev