White Space

Tim Bray tbray at textuality.com
Mon Aug 16 23:39:34 BST 1999


At 02:16 PM 8/16/99 -0700, arkin wrote:
>>  An XML processor must always pass all characters in a document that are
>>  not markup through to the application. A validating XML processor must 
>>  also inform the application which of these characters constitute
>>  white space appearing in element content.
>>
>> Element content is a hyperlink to the formal definition of an element
>> where the DTD says you can have only other elemennts.
>>
>> What's not clear here? -Tim
>1. How does that relate to the spaces attribute and to the
>default/preserve values?

No interaction.  The spaces attribute is a message from the author to 
downstream applications, that's all.

>2. How does that relate to new line in the beginning/end of text
>content?

All newline combinations come to the app as a single LF character, which
is white space and thus the spec paragraph quoted above applies.

>3. What is the application and what is an XML processor? Is it possible
>that a generic XML "parser" generating SAX/DOM can be both processor and
>application?

Uh, have you read the XML spec?  The spec defines in some detail what
an XML processor is.  The application is any other software that's not
the processor. The spec says *nothing* about what an application can
or can't or must do.

>4. Many applications do not care about whitespace, they only care about
>the meaningful content. And many developers do not have the expertise to
>strip away whitespaces properly. Is there anything we can do about it?

If the answer was easy, it would be in the XML spec.  It turns out to
be very nearly impossible to write a set of rules that are useful
across different application spaces and are still comprehensible to
human beings.  SGML fell apart completely in this area, and the people
who wrote that were smart.  For text that's going to be presented to 
humans, I think HTML browsers get it about right; but go try to write down 
the rules describing what they actually do and you'll see it ain't simple.  
And that's just one class of application. -Tim


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list