5 Whitespace Rules
Neil Bradley
neil at bradley.co.uk
Sat Aug 9 15:59:40 BST 1997
> Reply-to: Paul Grosso <paul at arbortext.com>
> At 23:13 1997 08 08 +0000, Neil Bradley wrote:
> >RULE 3. All other whitespace in element content is discarded.
>
> >
> >Note that only the presence of spaces and tabs in element content,
> >which is not common, will cause discrepancies between validated and
> > non-validated processing.
>
> This is the crux of the problem. As soon as you say something about
> element content, you get different results from the document when
> you process the DTD and when you don't.
Yes, but as I say, the problem only arises if people put spaces or
tabs in element content, which in my experience is very unusual.
> You don't say explicitly what happens when you don't process the
> DTD, but I assume your Rule 3 doesn't do anything in that case.
> Therefore, your Rule 5 will turn all line-end codes into a space,
> and it is extremely common to have line-end codes in element
> content. So your Rule 3 will cause you to end up with lots of
> spaces when you process in the absence of a DTD that you wouldn't
> get when you process in the presence of the DTD.
No, Rule 2 has already dispensed with these CR and LF codes. I
should have made it clear that this rule applies to non-validated
input. So...
<chapter>[CR]
<note>[CR]
<p>[CR]
This is a para in a note[CR]
</p>[CR]
</note>[CR]
...
becomes
<chapter><note><p>This is
a para in a note</p></note>...
...before Rules 3 and 5 are applied.
This was my whole point about separating line-end code processing from
spacing character processing.
> >
> >RULE 4. Line-end codes are discarded when preceded by a hard or
> >soft ('°') hyphen (and a soft hyphen is also discarded).
> >Remaining line-end codes are treated as spaces.
>
> This might be a nice heuristic for incoming WP files, but it doesn't
> agree with SGML. If I had "a - b" in my document and a line-end
> happened to occur after the -, you'd turn my file into "a -b".
Yes, well, I can only suggest this is unlikely to happen, and in any
case Rule 4 is only a suggestion for paginating applications. I am
open to suggestions here, but for now I am far more concerned about
the Rules 1 to 3.
> paul
Neil.
-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil at bradley.co.uk
www.bradley.co.uk
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list