5 Whitespace Rules

Paul Grosso paul at arbortext.com
Sat Aug 9 15:10:17 BST 1997

At 23:13 1997 08 08 +0000, Neil Bradley wrote:
>RULE 3. All other whitespace in element content  is  discarded.

>Note that only the presence of spaces and tabs in element content, 
>which is not common, will cause discrepancies between validated and 
>non-validated processing.

This is the crux of the problem.  As soon as you say something about
element content, you get different results from the document when you
process the DTD and when you don't.  

You don't say explicitly what happens when you don't process the DTD,
but I assume your Rule 3 doesn't do anything in that case.  Therefore,
your Rule 5 will turn all line-end codes into a space, and it is
extremely common to have line-end codes in element content.  So your
Rule 3 will cause you to end up with lots of spaces when you process
in the absence of  a DTD that you wouldn't get when you process in the
presence of the DTD.

>RULE 4.  Line-end codes are discarded when preceded by a hard 
>or soft ('°') hyphen (and a soft hyphen is also discarded).
>Remaining line-end codes are treated as spaces.

This might be a nice heuristic for incoming WP files, but it doesn't
agree with SGML.  If I had "a - b" in my document and a line-end
happened to occur after the -, you'd turn my file into "a -b".


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)

More information about the Xml-dev mailing list