Whitespace

Neil Bradley neil at bradley.co.uk
Sun Sep 14 10:26:11 BST 1997


> Reply-to:      Arnaud Le Taillanter <arnaud21 at club-internet.fr>

> Neil Bradley proposed some simple rules (this is "version 1", a second
> version, a little more complex, but simple enough, was proposed). I
> really like
> the approach, even if it doesn't work for the moment.

I agree they are inadequate, but I think my second attempt was more 
acurate than my first, so I am surprised that you now dissect the 
first attempt. Still, I am happy to see this issue continue to be 
aired.

 
> *Rule 1*: standardization of input from different OSs.
>  CR, LF, CRLF are translated to a line end code.
> OBVIOUS!!!!!

Absolutely, but perhaps not to some programmers unfamiliar with, for 
example, the Mac line-end conventions.
 
> *Rule 2*: line end codes after a start tag or before an end tag are
> discarded. A simple rule. For usual elements, it is exactly what you
> expect :

> <P><EM>Two
> </EM>words</P>
> becomes
> <P><EM>Two</EM>words</P>
> The space between "Two" and "words" evaporated.
> Same thing with:
> <P><EM>
> Two
> </EM>words</P>
> I don't think this particular problem is important: the encoding
> is not natural. It should be an error!
>  I think everybody would write:
> <P><EM>Two</EM> words</P>, or
> <P>
> <EM>Two</EM> words
> </P>, etc...

I have long thought that 'some' formatting options should simply be 
made illegal, and that we should then ensure widespread knowledge of 
restrictions to future document authors. This is the main example I 
had already considered.

> Inside a preserved element, line end codes are wrongly discarded
> after element start tags and before element end tags:
> <PRE XML-SPACE="PRESERVE">
>          blabla <EM>
>          bloblo</EM>
>          blublu
> </PRE>

Again, I think this coding is very unnatural. 



> *Rule 4*: Except in preserved elements (elements
> with a space attribute set to "PRESERVE") line end codes are
> discarded when preceded by a hard or
> soft hyphen (in the process, a soft hyphen is also discarded) and
> remaining line end codes are treated as space. 
> 
> The rule concerning hyphens is not necessary. If it's a hard hyphen,
> don't put it at line end (who would do that?)

It is in fact a very natural action, which I have seen many times.

> Moreover, there is no use in an XML source file to put a soft
> hyphen at line end. Who would do that? In my poor life, I have no occa-
> sion to see some text with hyphens at line end.

I have. Many times.
 
> *Rule 5*: except in preserved elements, consecutive WS characters
> are reduced to a single space.
> 
> I don't like this rule. If I put two spaces after a point, I mean two
> spaces.
> It's a typographic decision.
> Rule 5 is meant to allow some indentation:
> 
> <P>
> He said:
>      <QUOTE>
>            I need some
>            indentation.SPSPIndentation is needed.
>      </QUOTE>
> </P>

NO IT WAS NOT! I have never said this, and I did not intend to imply 
this. The reason for this rule was purely to remove surplus spaces 
generated by the effect of previous rules.
 
> Arnaud

I am more than happy for people to pull-apart my proposed rules. That 
is what I put them here for. But please refer to the second attempt, 
not the first.

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil at bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list