PCDATA vs CDATA

Tue Jul 7 20:36:36 BST 1998

Tom Otvos wrote:

> Hmm, is that the only case where an XML parser might do the "wrong thing" if
> it came across a document without a supporting DTD?

By definition, of course, an XML parser can't do the "wrong thing",
since XML is already defined.  I assume, though, that you want to
know what a hypothetical "simplified-SGML" parser might do incorrectly
without a DTD.

> It seems to me that if
> a document comes through without a DTD, and an element contained data not
> explicitly escaped, then it would not be unreasonable to assume PCDATA and
> try to parse it.  However, if a DTD is there to provide more info, then use
> it.  I am not sure I see how it is significantly different than validating
> that an element may, or may not, be a child of another element.

Because in (SGML) CDATA there may be text strings that look like
tags, but don't meet the rules for tags.  That would produce documents
which are valid but not well-formed, which is generally agreed to
be a Bad Thing.  Since XML has no tag omission or minimization,
elements can be parsed correctly even without content models
available.

XML actually goes further: except within "<![CDATA[" and "]]>" brackets,
every "<" character is guaranteed to be markup, including ones
within attribute values (which are errors, of course).

-- 
John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)