SAX: Whitespace Handling (question 5 of 10)

Peter Murray-Rust peter at ursus.demon.co.uk
Wed Jan 7 00:31:02 GMT 1998


At 14:16 05/01/98 -0000, Michael Kay wrote:
>>BTW: IMHO, IFF there is going to be a "default implementation" anyway, I
>>would actually prefer an "ignorableWhitespace" method which calls charData
>>by default. This will permit cleaner implementations.
>
>
>I may be simple-minded, but surely the default action with ignorable white
>space should be to ignore it?

Not simple-minded :-)

The whitespace issue is not trivial, but is (I think) consistent. The
*parser* has no option except to pass all characters that are not markup to
the application. This means that in:
<FOO>
  <BAR/>
</FOO>

A parser MUST pass the equivalent of

<FOO>\n\s\s<BAR></BAR>\n</FOO>

to the application.  

In a well-formed document there is NO indication of which character data
are/are_not significant ("ignorable") so by default the application will
have a tree structure where FOO has 3 children.

FOO
  "\n\s\s"
  BAR
  "\n"

If the application is told through
stylesheets/PIs/hardcoded_semantics/telepathy/a_human that all whitespace
is ignorable, fine - but it is NOT part of the XML spec.

If the DTD reads:

<!ELEMENT FOO (BAR)>

the "validating parser" (and we are still struggling with exactly what one
of those is :-) MUST tell the application:

"Hey! Be  careful! I've sent you a FOO, but it has element-only content, so
you may wish to ignore all the whitespace-only children of the FOO". The
application should say thank you, and then do whatever it feels like doing
with this information.

HOW the parser tells the application is what we are tackling.  DavidM has
suggested that when the "ignorable whitespace" is emitted from the parser,
it generates a special event. This seems reasonable - I suppose there could
be other methods (even simply announcing which elements had element-only
content should be sufficient).

[Please shoot this down if I've got it wrong :-)].

	P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list