Whitespace

Bill Donoghoe bdonoghoe at spin.net.au
Tue Aug 19 15:46:42 BST 1997


>Sean Mc Grath wrote:
>>> Peter Murray-Rust's post removed to conserve space
>
>**Warning:** Rush of blood to the head follows. Get those flame throwers
>ready...
>
>I know this whole white space thing was trashed out at length some time ago but
>it worries me greatly that on XML-DEV the whole issue seems to be as 
problematic
>as it was before XML-Lang's rulings on whitespace handling where decided upon.
>It seems that the problem was not really solved - just pushed up a layer:-)
>
>It just sounds wrong to me that white space handling is to be the subject of
>application conventions rather than part of the core XML parsing activity.
>
>Anyway, I think everyone should be allowed over-simplify the "White Space
>Problem"
>once in there lives! Here is my contribution:-
>
>
>Ban mixed content. Mixed content is a markup minimization feature.
>
>If you want a chunk of PCDATA in an XML doc, use the <PCDATA>
>reserved element name.
>
><foo>
>   <pcdata>I am data 1</pcdata>
>   <pcdata>I am data 2</pcdata>
></foo>
>
>Becomes
><foo><pcdata>I am line 1</pcdata><pcdata>I am line 2</pcdata></foo>
>
>If you need whitespace to be something other than whitespace- i.e. a
>newline to be a real newline to be passed on to the application, use an
>empty element type to represent it.
>
><foo>
>   <pcdata>I am data 1</pcdata><newline/>
>   <pcdata>I am data 2</pcdata>
></foo>
>
>
>Give me five minutes to put on the asbestos suit and then you flame
>away....
>
Instead of flaming you I will hope onto the bandwagon (can I borrow the 
asbestos suit for awhile).

Firstly to paraphrase some earlier comments, the "whitespace problem" has 
resulted from its dual personality.

Personality 1.  The programmer's whitespace ("pretty printing") is used as a 
layout tool for visual editing of the markup and content.  Besides, lots of 
editing applications won't allow lines over 250 characters.

Personality 2.  The whitespace is part of the content used because the 
author either wanted it that way or he/she could not see any other easy way 
to encode the information correctly.

SGML tried to cater for both personalities and it succeeded in a moderate 
fashion.  The downside was that it is not an easy task to maintain and 
process SGML documents.

Now for some personal opinion on what I thought XML was all about.  XML is 
an attempt to either simplify SGML (get rid of or change the bits which make 
it hard to understand/use/process) or extend HTML to deal with information 
content as well as presentation.  I lean towards the former view "SGML for 
the Web".  

IMHO the current XML "whitespace handling" has not simplified the SGML 
situation significantly.

Here are some comments and slight variations on Sean's suggestion.

I belive that Sean's suggestion has plently of merit.

What is wrong with having some standard elements 
(<PCDATA>,<CDATA>,<NEWLINE>)which are part of every XML DTD?

If you didn't want users to have to author these tags then "normalisation" 
applications could be developed which could convert "raw" XML into the 
"normalised" version.

Example:

<foo>
   I am data 1
   I am <emph>data</emph> 2
</foo>

could be normalised to:

<foo>
   <pcdata>I am data 1</pcdata><newline/>
   <pcdata>I am data 2</pcdata>
</foo>

or

<foo><pcdata>I am data 1 I am <emph>data</emph> 2</pcdata>
</foo>

depending on the DTD declarations for the elements or a style sheet (?!!)

However, normalisation is not needed if the authors can be given tools which 
can produced the desired markup.

Thus, all whitespace in the "normalised" documents could be collapsed to a 
single space (because we removed personality 2 we are only left with pretty 
printing).

I will stop rambling now.

IMHO the solution lies in removing the dual personalities of whitespace at 
document authoring time (or at its interface to XML tools for documents 
tagged by human hand).

Regards,
Bill


Regards,
Bill Donoghoe              bdonoghoe at acslink.net.au
InfoTech (NSW) Pty Ltd     mobile: 014 625 397 (in Australia)
SGML/HyTime/DSSSL/XML Consultancy and Development


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list