whitespace
David G. Durand
dgd at cs.bu.edu
Tue Sep 16 19:16:44 BST 1997
At 11:22 AM -0500 9/14/97, Arnaud Le Taillanter wrote:
>Inside the XML WG mailing list the WS issue was surely
>extensively discussed, but I don't have access to
>the archive of this discussion. I know it's already
>a favor that the XML draft is made public (all drafts
>and standards of W3C are public, I think this
>helps) and that XML WG members are participating
>in the xml-dev mailing list (they could avoid it).
I agree that it's rather unfair of me to make a reference to a discussion
that I can't produce.
>Well, I ask for another favor: could you please make the
>discussion about WS that led to the WG decision
>available on line? After such a reading, everybody
>could become convinced of the appropriate nature
>of the WG decision. Please!
Well, it's up to the W3C, not me -- as a member of the SIG (not even the
decision-making part of the working group) I have no power to do this.
There were some public archives of some parts of the discussion -- I think
this is no longer allowed for the current discussions, under the W3C's
confidentiality rules.
You could try an Altavista search for my name -- it used to come up with a
WWW archive of the old mailing list, and the URL may still work.
I do doubt that people will want to re-read that discussion, however, once
they have seen it. I was not exaggerating when I put the count at hundereds
of messages. Most of these were repetitive, because the total list of
factors involved, in the end, is the short list in my mail. The desire for
simple rules, and need to work without DTDs the same way as with DTDs, and
the desire for SGML compatibility all needed to be balanced. In fact, they
were incompatible -- SGML as it stands has complicated rules, that we
finally asked the ISO to relax. And _any_ solution that differentiates
element content from mixed content requires a DTD or other declaration
(under SGML rules or even new ones). The proposal to add a new declaration
for element content was abandoned because it's rendundant with a DTD, and
confusing without -- a likely source of errors rather than a convenience.
>> Every variation
>> you discussed has been gone over and they all were either:
>> 1. unworkably complex (like the current SGML rules, whihc few
>> remember and even fewer remember correctly)).
>
>Agreed.
So we have point 1 nailed down.
>> 2. Not compatible with SGML, or unworkably ugly like the proposal to
>> quote all literal text.
>
>If SGML rules concerning WS are to be discarded, any
>other rule adopted is incompatible, including the draft rule.
Yes, but the ISO was willing to add the pass-all-whitespace rule to SGML,
and it wil be official in a few months. No other proposal also solved the
very real problems of SGML->SGML transformation caused by parsers hiding
whitespace, and so there was little independent reason to add them into
SGML.
That nails down point 2.
>
>> 3. Failed to work without a DTD. This is the kicker, and it's
>> required by XML because you don't always have the DTD, and different
>> results in the has-DTD/doesn't-have-DTD cases are unacceptable.
>
>I agree.
So that nails down point 3. And we really agree! :)
.... oh:
> The tree structures must be exactly the same in either case.
>Some constraint regarding WS is necessary on the way to input an
>XML text I assume.
I'm not sure what you mean, here. Any method for ignoring whitespace must
enable:
1. explicit whitespace to be posible wherever is is wanted (including
near element boundaries).
2. Line-breaks to be preserved for some (verbatim, or <pre>-style) elements.
3. Can't depend on the DTD or other declarations to control it.
The simplest proposal that does this is to pass all whitespace.
The only real drawback is that _some_ applications (like table formatters)
may have to explicitly ignore whitespace in _some_ contexts where a
traditional SGML parser would have been able to do it for them. Linking
applications must deal with (count), and can't ignore whitespace chunks
that in some cases may have little meaning to a user.
The benefits are "simplest possible rule", easy XML->XML transduction that
preserves the original formatting, a dependable way to count character data
in documents that contain whitespace, regardless of whether you have a DTD.
>> The recent change (to normalize all linends) fills the one hole the
>> previous proposal had -- because it was nearly certain that some
>> processes would blindly change CRLF and their ilk anyhow.
Note that this is only data normalization permitted in XML, and that it
only warrants processes like the changing of line-ending conventions (eg
from PC to Mac) -- that we all know would have taken place anyway, causing
errors, even if they were explicitly prohibited by the standard.
>> My advice: don't waste your bytes complaining about this -- we've
>> heard it _all_ before -- and the solution that works best is to leave
>> it to the application.
>
>I am sure I will get
>convinced when I read the WG discussion :-)
>Or I fear the WG members will have to hear it all (and more)
>again :-))
My advice was just advice about what expectations you could have of
_results_ from whatever discussion ensure. Feel free to discuss whitespace
to your heart's content. But don't expect XML to change.
I'll see if there's any way the archives of the whitespace debate can be
made available, but I can honestly say that they're painful rather than
enlightening reading. Expect to devote several days to the reading, too, if
they do becom public.
I was a chief proponent of the current approach, even at the beginning,
when most in the group did not want to do anything so radical, so I agree
that explanations of the decision are worthwhile -- and I've tried to
contribute such -- but I'm certainly not going to read an extended rehash
on the issue. I've devoted my pound(s) of flesh to whitespace already.
-- David
RE delenda est!
David Durand dgd at cs.bu.edu \ david at dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list