White Space

David Brownell david-b at pacbell.net
Mon Aug 16 19:05:08 BST 1999


David Megginson wrote:
> 
> arkin writes:
> 
>  > A generic SAX parser has two methods of reporting character data, one
>  > clearly indicates that such character data is whitespace. What type of
>  > whitespace should be reported as whitespace? Can the application simply
>  > ignore whatever character data is reported as whitespace?
> 
> The only whitespace reported that way is whitespace in element-only
> content: that means that there has to be a DTD, and the DTD has to say
> that an element can contain only other elements.  This is a reporting
> requirement for validating parsers from the XML 1.0 recommendation.

Hmm, the XML spec never quite seemed clear about that to me.  It didn't
quite include a definition of the term "ignorable whitespace".

What about an empty element "<EMPTY>  <!-- spaces!! --> </EMPTY>" ...
isn't that "ignorable" whitespace as well?  It "must be" passed to the
app, and clearly isn't regular character text.

FWIW I concluded "ignorable" whitespace is within elements that have a
content model that's not "ANY" or a mixed content model.  That is, it's
wherever normal characters can't appear.


>  > The XML specification clearly indicates some guidelines for handling
>  > white space in a consistent manner that saves the application developed
>  > from dealing with it, and will solve all of our problems (maybe except
>  > world hunger). Would it be reasonable to define two SAX parser layers,
>  > one before and one after the white space stripping?
> 
> You can use the same API for both, but any whitespace stripping must
> be strictly at the application's discretion.

Where "application" is a fuzzy notion:  everything above the XML processor,
which could primarily consist of library code that doesn't want to give
such options to its callers.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list