xml:space

Peter Murray-Rust peter at ursus.demon.co.uk
Fri Feb 20 23:39:55 GMT 1998


I am considering how to treat xml:space in JUMBO and ask for help and
comments. <NOTE>I am NOT re-opening the whitespace debate; I am asking
those who understand xml:space if what I do/intend to do is reasonable.
xml:space is a formal part of the language and I feel I have to address
it.</NOTE>

1. Are there any documents which actually use xml:space?  rec.xml does not

2. Is there anyone on this list intending to use it? If so, what do they
expect "applications' default white-space processing modes" to be?

[Quotations are from rec.xml]

>An XML processor must always pass all characters in a document that are
not >markup through to the application. A validating XML
>processor must also inform the application which of these characters
>constitute white space appearing in element content. 

My philosophy in JUMBO (which is a generic application) is to accept all
whitespace from the parser/SAX, whether labelled 'ignorable' or not. All
PCDATA is stored in child nodes of elements. Those with ignorable
whitespace can be specially labelled.  IOW I do not discard any character
data on input.

>
>A special attribute named xml:space may be attached to an element to
signal >an intention that in that element, white space should be
>preserved by applications. In valid documents, this attribute, like any
>other, must be declared if it is used. When declared, it must be
>given as an enumerated type whose only possible values are "default" and
>"preserve". For example:
>
>      <!ATTLIST poem   xml:space (default|preserve) 'preserve'>
>

OK. If xml:space="preserve" I have no problems.
If xml:space="default" I am asking for help. Note that xml:space="default"
could apply either to ignorable whitespace or non-ignorable w/s
If xml:space is absent, I suggest options below...

>
>The value "default" signals that applications' default white-space
>processing modes are acceptable for this element; the value
>"preserve" indicates the intent that applications preserve all the white
>space. This declared intent is considered to apply to all
>elements within the content of the element where it is specified, unless 
This causes me slight concern. It means I have to write code that
automatically tracks what elements have an xml:space attribute. This is
possible, but yet another thing that has to be done. I might be motivated
to do it if I am shown some use for it...

>overriden with another instance of the xml:space attribute. 

This means effectively that every node in a document has to have an
xml:space flag. [Unless this is dynamically worked out every time the
document is to be rendered.]

--------

Without xml:space, and without a DTD, I can see the following *generic*
possibilities:
	- element is empty. [BTW the spec (and SAX) discards all knowledge of
whether this was created by <FOO></FOO> or <FOO/>. I approve of this.].
Children are not displayed because there aren't any
	- element contains non-w/s characters. This is displayed as either as a
string or as a title-value pair (at user option). The title is determined
by simple heuristics.
	- element contains element content. This is displayed as a tree. I am
considering also allowing the user to display this as a tagged/untagged
event stream, but the tree is the default.
	- element contains element content and (some) non-w/s PCDATA children .
This is displayed as an untagged (or selectable) tagged event stream.
Unless the semantics of the tags are known or a stylesheet is provided, no
other rendering is possible.

Now the two w/s options...
	- element contains element content and (only) w/s children. This is
displayed by default as ignoring the w/s. Note that this is *display*, not
processing. Since the default is a tree, the w/s nodes aren't much use.
	- element contains a single w/s child. This does not display anything by
default.

The user can switch to display/hide PCDATA children in the tree display.

For *outputting* it is possible to delete the w/s nodes if required. Once
deleted they are gone ...

I would be interested in comments as to whether this is reasonable default
behaviour or whether there are other things that should be considered.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list