White Space

Mon Aug 16 12:49:51 BST 1999

arkin writes:

 > A generic SAX parser has two methods of reporting character data, one
 > clearly indicates that such character data is whitespace. What type of
 > whitespace should be reported as whitespace? Can the application simply
 > ignore whatever character data is reported as whitespace?

The only whitespace reported that way is whitespace in element-only
content: that means that there has to be a DTD, and the DTD has to say 
that an element can contain only other elements.  This is a reporting
requirement for validating parsers from the XML 1.0 recommendation.

It's up to the application to decide whether to ignore the
whitespace.  A non-validating processor may report the same characters 
as regular character data.

 > The XML specification clearly indicates some guidelines for handling
 > white space in a consistent manner that saves the application developed
 > from dealing with it, and will solve all of our problems (maybe except
 > world hunger). Would it be reasonable to define two SAX parser layers,
 > one before and one after the white space stripping?

You can use the same API for both, but any whitespace stripping must
be strictly at the application's discretion.

All the best,

David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)