5 Whitespace Rules

Neil Bradley neil at bradley.co.uk
Sat Aug 9 15:59:40 BST 1997



> Reply-to:      Paul Grosso <paul at arbortext.com>

> At 23:13 1997 08 08 +0000, Neil Bradley wrote:
> >RULE 3. All other whitespace in element content  is  discarded.
> 
> >
> >Note that only the presence of spaces and tabs in element content,
> >which is not common, will cause discrepancies between validated and
> > non-validated processing.
> 
> This is the crux of the problem.  As soon as you say something about
> element content, you get different results from the document when
> you process the DTD and when you don't.  

Yes, but as I say, the problem only arises if people put spaces or
tabs in element content, which in my experience is very unusual.

> You don't say explicitly what happens when you don't process the
> DTD, but I assume your Rule 3 doesn't do anything in that case. 
> Therefore, your Rule 5 will turn all line-end codes into a space,
> and it is extremely common to have line-end codes in element
> content.  So your Rule 3 will cause you to end up with lots of
> spaces when you process in the absence of  a DTD that you wouldn't
> get when you process in the presence of the DTD.

No, Rule 2 has already dispensed with these CR and LF codes. I 
should have made it clear that this rule applies to non-validated
input.  So...

 <chapter>[CR]
 <note>[CR]
 <p>[CR]
 This is a para in a note[CR]
 </p>[CR]
 </note>[CR]
 ...

becomes

 <chapter><note><p>This is
 a para in a note</p></note>...

...before Rules 3 and 5 are applied.

This was my whole point about separating line-end code processing from
spacing character processing.

> >
> >RULE 4.  Line-end codes are discarded when preceded by a hard or
> >soft ('&#176;') hyphen (and a soft hyphen is also discarded).
> >Remaining line-end codes are treated as spaces.
> 
> This might be a nice heuristic for incoming WP files, but it doesn't
> agree with SGML.  If I had "a - b" in my document and a line-end
> happened to occur after the -, you'd turn my file into "a -b".

Yes, well, I can only suggest this is unlikely to happen, and in any
case Rule 4 is only a suggestion for paginating applications. I am
open to suggestions here, but for now I am far more concerned about
the Rules 1 to 3.

> paul

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil at bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list