bdonoghoe at spin.net.au
Tue Aug 19 15:46:42 BST 1997
>Sean Mc Grath wrote:
>>> Peter Murray-Rust's post removed to conserve space
>**Warning:** Rush of blood to the head follows. Get those flame throwers
>I know this whole white space thing was trashed out at length some time ago but
>it worries me greatly that on XML-DEV the whole issue seems to be as
>as it was before XML-Lang's rulings on whitespace handling where decided upon.
>It seems that the problem was not really solved - just pushed up a layer:-)
>It just sounds wrong to me that white space handling is to be the subject of
>application conventions rather than part of the core XML parsing activity.
>Anyway, I think everyone should be allowed over-simplify the "White Space
>once in there lives! Here is my contribution:-
>Ban mixed content. Mixed content is a markup minimization feature.
>If you want a chunk of PCDATA in an XML doc, use the <PCDATA>
>reserved element name.
> <pcdata>I am data 1</pcdata>
> <pcdata>I am data 2</pcdata>
><foo><pcdata>I am line 1</pcdata><pcdata>I am line 2</pcdata></foo>
>If you need whitespace to be something other than whitespace- i.e. a
>newline to be a real newline to be passed on to the application, use an
>empty element type to represent it.
> <pcdata>I am data 1</pcdata><newline/>
> <pcdata>I am data 2</pcdata>
>Give me five minutes to put on the asbestos suit and then you flame
Instead of flaming you I will hope onto the bandwagon (can I borrow the
asbestos suit for awhile).
Firstly to paraphrase some earlier comments, the "whitespace problem" has
resulted from its dual personality.
Personality 1. The programmer's whitespace ("pretty printing") is used as a
layout tool for visual editing of the markup and content. Besides, lots of
editing applications won't allow lines over 250 characters.
Personality 2. The whitespace is part of the content used because the
author either wanted it that way or he/she could not see any other easy way
to encode the information correctly.
SGML tried to cater for both personalities and it succeeded in a moderate
fashion. The downside was that it is not an easy task to maintain and
process SGML documents.
Now for some personal opinion on what I thought XML was all about. XML is
an attempt to either simplify SGML (get rid of or change the bits which make
it hard to understand/use/process) or extend HTML to deal with information
content as well as presentation. I lean towards the former view "SGML for
IMHO the current XML "whitespace handling" has not simplified the SGML
Here are some comments and slight variations on Sean's suggestion.
I belive that Sean's suggestion has plently of merit.
What is wrong with having some standard elements
(<PCDATA>,<CDATA>,<NEWLINE>)which are part of every XML DTD?
If you didn't want users to have to author these tags then "normalisation"
applications could be developed which could convert "raw" XML into the
I am data 1
I am <emph>data</emph> 2
could be normalised to:
<pcdata>I am data 1</pcdata><newline/>
<pcdata>I am data 2</pcdata>
<foo><pcdata>I am data 1 I am <emph>data</emph> 2</pcdata>
depending on the DTD declarations for the elements or a style sheet (?!!)
However, normalisation is not needed if the authors can be given tools which
can produced the desired markup.
Thus, all whitespace in the "normalised" documents could be collapsed to a
single space (because we removed personality 2 we are only left with pretty
I will stop rambling now.
IMHO the solution lies in removing the dual personalities of whitespace at
document authoring time (or at its interface to XML tools for documents
tagged by human hand).
Bill Donoghoe bdonoghoe at acslink.net.au
InfoTech (NSW) Pty Ltd mobile: 014 625 397 (in Australia)
SGML/HyTime/DSSSL/XML Consultancy and Development
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev