Problems with whitespace and msxml

Peter Murray-Rust peter at
Thu Jan 1 14:48:20 GMT 1998

Whitespace has been (and I suspect will continue to be) a frequent topic on
XML-DEV :-) It can be a confusing topic and long-term members of XML-DEV
are sympathetic and helpful when it is raised.  

(A). There is no simple one-groks-all solution to the problem. If there
were, we should be using it :-)

(B) a lot of material about whitespace has been written on this list,
including 5 paragraphs from David Durand. You will find references to some
of the discussion on XML-DEV jewels:

At 08:43 01/01/98 -0500, David Megginson wrote:
>Alexander Hinds writes:
>[on xml:space]
> > Moreover, no matter what I set it to, I always get back whitespace
> > in my tree, even without a mixed content model (for example, for
> > element book, it's first sib is always whitespace).
> >  My question, basically is: how do I eliminate whitespace from my
> > tree entirely?  Or failing that how do I get the current value of
By not including it in your document :-)

> > xml-space in my ElementImpl subclass?  It appears that nameXMLSPACE

I have not managed to get msxml working yet, but assuming that you can
retrieve attributes values, xml:space is a potential attribute for any
element. The rules for its inheritance from root are given in the spec.

> > is private, not protected (why?) so a subclass can't really search
> > it.  But even when I change the visibility, it's always null
> > anyway.
>I have not used msxml recently, so I do not know what it does, but the
>PR is very clear that the 'xml:space' attribute is strictly
>informative (from 2.10, "White Space Handling"):
>   An XML processor must always pass all characters in a document that
>   are not markup through to the application. A validating XML processor

I find the phrase "validating XML processor" a confusing one because it
refers to a piece of software.  Validation requires:
	- enough information in the document to *allow* it to be validated (e.g.
enough ELEMENT and ATTLISTs to cover all elements found in the document.)
	- a decision that the document *should* be validated. This may come from:
		- the author (implicit in the inclusion of a DTD and some PIs)
		- the client software (e.g. it makes decisions as to when to validate)
		- the human user ("press the validate button").
	- software sufficiently powerful to map the content of an element on to
its contentSpec.

IOW the identification of ignorable whitespace (which is *mandatory* for a
validating parser) depends on an unclear combination of the above.

>   must distinguish white space in element content from other non-markup
It can only do this if the document allows it to...

>   characters and signal to the application that white space in element
>   content is not significant.
>   A special attribute named "xml:space" may be inserted in documents to
>   signal an intention that the element to which this attribute applies
>   requires all white space to be treated as significant by applications.
>In other words, the value of xml:space should _not_ affect the
>information that msxml returns to your application; instead, it is up
>to your application to read the value, if present, and to take
>appropriate action.  Msxml should return all whitespace, no matter

And - assuming it calls itself a validating parser - *must* identify which
of that whitespace is significant and signal that to the application.

>I have heard rumours that xml:space may some day be removed from the
>core XML spec and put into a separate "XML Conventions" spec -- that
>would be a very good idea.

We should be careful not to act on rumours on XML-DEV. There is a carefully
controlled process which requires discipline from those wishing to use XML.
Some of the deliberations are confidential (e.g. XML-SIG - and as a member
of that I cannot confirm or deny any speculations about what is discussed
there). XML relies on the community adhering to the spec as closely as they
can - this in itself is not easy.

OTOH I have publicly made it clear that I think that conventions are going
to be essential for the implementation of XML systems (and whitespace would
be a strong candidate).  This is why I have raised the idea of XDEV (an
informal set of conventions aired on the list) and shall continue to pursue
this. IFF the XML process formally wishes to set up a conventions WG or
similar I shall be very happy, but until they announce something like that
we cannot and should not assume it.


>All the best,
>David Megginson                 ak117 at
>Microstar Software Ltd.         dmeggins at
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
>Archived as:
>To (un)subscribe, mailto:majordomo at the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo at the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa at
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS, Virtual Hyperglossary

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list