XSchema Spec Section 2.2, Draft 1

David G. Durand dgd at cs.bu.edu
Wed Jun 17 19:29:11 BST 1998


At 1:02 PM -0400 6/17/98, Paul Prescod wrote:
>On Wed, 10 Jun 1998, John Cowan wrote:
>> What "festering hole"?  XML's solution is simple: all whitespace is
>> meaningful, just as all non-whitespace is meaningful.
>
>That is not true. Some whitespace is "insignificant." You can only find
>out which whitespace falls into this category with a validating parser,
>however.
>
>http://www.w3.org/TR/REC-xml#sec-white-space

However, John's intepretation is a legitimate one for an _application_ like
an Xschema validator to take. Applications are _allowed_ to ignore
"ignorable" whitespace, but are not required to.

>This is a mess, because the validator's idea of what is ignorable is
>necessarily different from that of applications built on top of
>non-validating parsers. Whitespace which is NOT considered PCDATA in one
>program (especially validating parsers) will be considered PCDATA in
>another.

Which is one reason that I expect very few parsers to pay any attention to
the "ignorable" information. It's there, if you want it, and you want to
require the use of a validating parser so your application can use it.

>Although whitespace in XML is gross, I don't claim that there is a better
>solution. Merely removing the concept of significance may be simpler, but
>in my opinion that "solution" has its own problems.

I've always advocated the signifcance ignoring solution. Since the standard
allows application writers to adopt it, _if they choose_. I suggest that
Xschema's feel free to, if it simplifies their implementation and
processing model.

>> I don't claim familiarity with SGML theory or practice, but I note
>> the existence of something called the "SGML mixed content problem".
>> As far as I can see, the standard solutions to this problem are
>> exactly what XML mandates: the simple (#PCDATA | foo | bar)* content
>> models.
>
>The mixed content problem is related to the grossness of whitespace.
>Since our role is merely to verify markup, and not to interpret it for an
>application, I think that we can get around it easily. Whitespace
>verifies as #PCDATA if #PCDATA is allowed at a particular point.
>If #PCDATA is NOT allowed at a particular point, the whitespace is
>ignored for the purposes of verifying.

the SGML "mixed content problem" is tha fact that SGML validating parsers
are _required_ to report an error, if whitespace is found in an element
whose content model contains #PCDATA, and at a location where #PCDATA is
not allowed. It's been solved in XML by the restriction on
#PCDATA-containing content models.

XML has a slightly different problem: In the absence of a DTD, you can't
tell if whitespace is "ignorable" because #PCDATA is not allowed in the
element, or "significant" because it _is_ allowed in the element.

I suggest that the "signficant"/"insignificant" distinction is _itself_
insignificant, and should be treated as such, by all except validating
parser writers, who are required to report it. Hopefully applications that
care, and thus produce different results when validating and when not
validating, will be very thin on the ground.

  -- David

_________________________________________
David Durand              dgd at cs.bu.edu  \  david at dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list