Whitespace rules (v2)

Neil Bradley neil at bradley.co.uk
Sun Aug 17 09:43:33 BST 1997

Peter Murray-Rust wrote:

> If a set of rules *does* emerge, then how can we generally inform an application
> that it should take them as DEFAULT?  I assume this is through a PI:

I was hoping that relevant applications (mainly browsers and 
typesetting systems) will ALWAYS assume the rules that are finally 
determined, except where preserved content (or some other set of 
rules) is explicitly actioned.
> I agree with Liam - I didn't understand 'blockness'.  I also think that whatever
> is done here has to be independent of stylesheets and DTDs.  The average hacker
> like me simply won't undertsand the subtleties.

I am merely trying to distinguish in-line elements from other 
elements. An in-line element implies no line-breaks above or below 
it. A 'Block' element therefore DOES imply such a break. I do not use 
the terms element and mixed content here, because it is not quite the 
same thing. As I have said before, a Para element is a 'block' 
element, and has mixed content, but an Emph element is an 'in-line' 
element, yet also has mixed content. All style sheets, including 
CSS, understand the concept of in-line and block elements. Any 
whitespace surrounding a block element MUST be irrelevant.

Liam raised the issue of a half-way element type, such as a header 
which implies a line-break before it, but not after, so that 
following text will appear on the same line. This one is tricky. 
Suggestions anybody?

> I would assume that this processing takes place in the application, not the
> parser.  How/whether comments are passed to the application is part of the
> parser API.  I assume that at this stage the comment is recognised as a single
> chunk which can be deleted with/out surrounding whitespace as required.

As I say at the top of the rules, ALL these rules are applied by the 
application, not the XML processor.
> This one is tough.  Please criticise my current view :-).  SGML documents seem
> to use markup as structure in some places (e.g. OL/LI in HTML) or
> event streams (e.g. EM, B in HTML). Authors/readers expect different processing
> modes from these types. The example above is best treated as structuring
> markup (P) containg an event stream (#PCDATA|EM)* [sorry for abbreviations].
> So we have to indicate to the processor that P is structuring and that 
> whitespace after <P> or before </P> is irrelevant, and that its content is an 
> event stream where all whitespace is normalised to a single space (cf HTML.)
> Therefore can we have something like this:
> <Paragraph>
> This is<Emphasis>very</Emphasis>strange.
> </Paragraph>

I think that, ultimately, some combinations of markup will always 
break whatever rules we come up with. We must ensure that only 
obscure, non-intuitive combinations do this, then just shout from 
the rooftops that these combinations are not to be used.
> > 
> > > RULE 4.  A remaining line-end code is converted into a space, except when it is 
> > > preceded by a normal (hard) hyphen, or by a soft hyphen ('&#176;'), 
> > > in which case it is removed (a soft hyphen is also then removed). 
> > > ---
> I have to argue against this :-(.  A hyphen is indistinguishable from a minus
> to lots of people. There are also many cases where people may wish to end
> a line with a minus:
> <MOL>
> CL-
> H+
> </ATOMS>
> </MOL>
> Since we are normalising whitespace, then lines can always be arranged so that
> hyphens are unnecessary.

My concern was to address existing text files, where hyphens are 
often used in this way. Maybe I am over-estimating this problem.


Neil Bradley - Author of The Concise SGML Companion.
neil at bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)

More information about the Xml-dev mailing list