SAX: ignorable whitespace question

Mon Aug 3 04:12:05 BST 1998

On Sun, 2 Aug 1998, Peter Murray-Rust wrote:

>                    XML (unlike HTML) does not normalise character content
> and all characters that are not markup are passed to the application.
> Ignorable whitespace is a device that SAX provides to help the application
> decide what action it may be able to take. If you are writing a SAX-based
> application you will need to understand this concept.

I think that CR, LF, or CR+LF are always normalized into LF. 

Eric Prud'hommeaux wrote:
> In that regard, it would seem that text is handled differently from
> system identifiers and attribute values.

As for attribute values, we do have different normalization.  As 
for systems identifiers, I do not understand your point.

> How about leading and trailing whitespace, or tags with just
> whitespace? For example, is "<tag>some  text\r\n\t</tag>" reported
> completely as characters and not split into characters("some  text")
> and ignorable("\r\n\t")? 

Right.  They are not split.

>Is the whitespace in "<t1>\n  <t2/>\n</t1>"
> ignorable? 

If 1) the DTD is available, 2) the element type t1 has an element content, and 
3) an XML processor uses the DTD to distinguish element content and mixed content, 
then the whitespace in <t1> is ignorable.

>I also assume from the XML spec that SAX is acting in the
> role of XML processor and must translate \r's so it would really be
> characters(" some  text\n\t").

Quite.

Makoto

Fuji Xerox Information Systems

Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata at apsdc.ksp.fujixerox.co.jp

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)