Problems with whitespace and msxml
Peter Murray-Rust
peter at ursus.demon.co.uk
Thu Jan 1 14:48:20 GMT 1998
Whitespace has been (and I suspect will continue to be) a frequent topic on
XML-DEV :-) It can be a confusing topic and long-term members of XML-DEV
are sympathetic and helpful when it is raised.
(A). There is no simple one-groks-all solution to the problem. If there
were, we should be using it :-)
(B) a lot of material about whitespace has been written on this list,
including 5 paragraphs from David Durand. You will find references to some
of the discussion on XML-DEV jewels:
(http://ala.vsms.nottingham.ac.uk/vsms/xml/jewels.html)
At 08:43 01/01/98 -0500, David Megginson wrote:
>Alexander Hinds writes:
>
>[on xml:space]
>
> > Moreover, no matter what I set it to, I always get back whitespace
> > in my tree, even without a mixed content model (for example, for
> > element book, it's first sib is always whitespace).
> > My question, basically is: how do I eliminate whitespace from my
> > tree entirely? Or failing that how do I get the current value of
^^^^^^^^^^^^^
By not including it in your document :-)
> > xml-space in my ElementImpl subclass? It appears that nameXMLSPACE
I have not managed to get msxml working yet, but assuming that you can
retrieve attributes values, xml:space is a potential attribute for any
element. The rules for its inheritance from root are given in the spec.
> > is private, not protected (why?) so a subclass can't really search
> > it. But even when I change the visibility, it's always null
> > anyway.
>
>I have not used msxml recently, so I do not know what it does, but the
>PR is very clear that the 'xml:space' attribute is strictly
>informative (from 2.10, "White Space Handling"):
>
> An XML processor must always pass all characters in a document that
> are not markup through to the application. A validating XML processor
I find the phrase "validating XML processor" a confusing one because it
refers to a piece of software. Validation requires:
- enough information in the document to *allow* it to be validated (e.g.
enough ELEMENT and ATTLISTs to cover all elements found in the document.)
- a decision that the document *should* be validated. This may come from:
- the author (implicit in the inclusion of a DTD and some PIs)
- the client software (e.g. it makes decisions as to when to validate)
- the human user ("press the validate button").
- software sufficiently powerful to map the content of an element on to
its contentSpec.
IOW the identification of ignorable whitespace (which is *mandatory* for a
validating parser) depends on an unclear combination of the above.
> must distinguish white space in element content from other non-markup
^^^^
It can only do this if the document allows it to...
> characters and signal to the application that white space in element
> content is not significant.
>
> A special attribute named "xml:space" may be inserted in documents to
> signal an intention that the element to which this attribute applies
> requires all white space to be treated as significant by applications.
>
>In other words, the value of xml:space should _not_ affect the
>information that msxml returns to your application; instead, it is up
>to your application to read the value, if present, and to take
>appropriate action. Msxml should return all whitespace, no matter
>what.
And - assuming it calls itself a validating parser - *must* identify which
of that whitespace is significant and signal that to the application.
>
>I have heard rumours that xml:space may some day be removed from the
>core XML spec and put into a separate "XML Conventions" spec -- that
>would be a very good idea.
We should be careful not to act on rumours on XML-DEV. There is a carefully
controlled process which requires discipline from those wishing to use XML.
Some of the deliberations are confidential (e.g. XML-SIG - and as a member
of that I cannot confirm or deny any speculations about what is discussed
there). XML relies on the community adhering to the spec as closely as they
can - this in itself is not easy.
OTOH I have publicly made it clear that I think that conventions are going
to be essential for the implementation of XML systems (and whitespace would
be a strong candidate). This is why I have raised the idea of XDEV (an
informal set of conventions aired on the list) and shall continue to pursue
this. IFF the XML process formally wishes to set up a conventions WG or
similar I shall be very happy, but until they announce something like that
we cannot and should not assume it.
P.
>
>
>All the best,
>
>
>David
>
>--
>David Megginson ak117 at freenet.carleton.ca
>Microstar Software Ltd. dmeggins at microstar.com
> http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list