Whitespace

David G. Durand dgd at cs.bu.edu
Tue Aug 26 15:52:49 BST 1997


At 5:51 PM -0500 8/25/97, Neil Bradley wrote:
>I want to
>be able to use XML editor A, and allow people to view the
>output on browser B and C, publish it on DTP system D,
>send the data to someone else using editor E,
>and let people search for pseude-elements using extended pointers
>in products E and F, and all without extra spaces appearing or
>vital spaces disappearing at any point.


Vital spaces will never disappear in _XML parsing_ because all whitespace
is literally passed along. This means that the safe thing is just to leave
it in, and define stylesheets so they can strip any excess space.

They'll only be disappearing if applications have bugs (which can be dealt
with app-by-app, or if XML processors start "doing favors" for applications
by "pre-normalizing" the data.

>I cannot understand why some people think this will not be problem.

I don't understand how it _can_ be a problem (in general, rather than due
to particular bugs).

>We are getting extreme views here, from let the XML processor handle
>it, to let every application do its own thing. Neither position is
>acceptable.
>OK, lets rule out special cases. I can accept that CML and CDF etc
>will have their own strict rules, perhaps, but I am far more
>concerned with general document editing and publishing (the sort of
>things HTML and SGML have been primarily used for).
In general document editing, you still have DTDs and will still have
conventions for whitespace. In particular, any formatting application _must
have_ a stylesheet or other formatting spec. That is the correct place for
formatting information about whitespace collapse to be specified.

>Do we want XML to gain a reputation as an unreliable
>data exchange and publishing format?

Then we'd better not start dropping data in the parser!

>We should not have to burden document authors with processing codes,
>etc. People want the ease of use of HTML (and, dare I say it, SGML
>too, in this respect at least). I still think this is unnecessary.

>Others have recently proposed the style sheet as the answer, and I
>agree. My original proposal to base some of the rules on in-line/block
>definitions
>assumed this approach. It is more reliable than
>element content versus mixed content. I do not, however, think we
>need to go as far as waiting for the official DSSSL based style sheet
>to be completed. I for one do not believe all XML-aware applicaitons
>will use it, and certainly not in the short term. Any config file or
>style sheet will suffice.

Personally, despite the sliught nausea engendered by the theought, I expect
that some CSS variation will be the one in common use -- and that CSS will
usually fold space like HTML does now.

>People are also proposing all kind of Unicode special characters to
>perform vital tasks. Let's remember here that few people even have
>the specification, let alone use this set extensively. I am sure its
>time will come, but let us be realistic. XML is going to be in
>widespread use first, and needs to be workable with 7-bit ASCII, if
>possible, and ISO 8859 if not.

XML is _defined_ to be Unicode, and the only way to do simple 8-bit
processors is to use UTF-8 -- but of course, that just makes special
unicode chars look like "escape sequences". Not so bad, really.

>I did not expect the rules I (nervously and tentatively) proposed to be
>acceptable.
>But I did hope they could form the basis of detail discussion, from
>which a better set of rules would emerge. Unfortunately, we seem to
>be getting nowhere. I am trying not to depair. But it's hard.

All I care about is that XML-dev not give the impression that generic XML
processors should start folding whitespace, since we explicitly removed
whitespace processing from XML to avoid the "vanishing space problem".

If we can find any applications other than formatting, and that don't
depend on knowing the meanings of the tags, then we need to consider using
PIs to declare special whitespace folding in a document. I don't currently
believe that such applications exist -- because I can't some up with any.
When I thought they _might_ exist, I thought that this kind of spec. would
be a good idea. Now it just seems to add confusion where we had made
simplicity.

I still think "all whitespace is significant" is the simplest rule we can
use that allows everything that we can do today.

  -- David

_________________________________________
David Durand              dgd at cs.bu.edu  \  david at dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list