PCDATA vs CDATA

Wed Jul 1 00:35:47 BST 1998

> Hmm, is that the only case where an XML parser might do the "wrong thing" if
> it came across a document without a supporting DTD?  

Yes.  There are some things an XML parser can't do without a DTD:

- validating (obviously)
- determining which whitespace is ignorable
- normalising attributes and inserting default values
- expanding entity references

but despite those constraints it can parse the document and determine
whether it is well-formed.

> It seems to me that if
> a document comes through without a DTD, and an element contained data not
> explicitly escaped, then it would not be unreasonable to assume PCDATA and
> try to parse it.  However, if a DTD is there to provide more info, then use
> it.  I am not sure I see how it is significantly different than validating
> that an element may, or may not, be a child of another element.

If the parser doesn't know that the content of an element is CDATA it
will very likely parse a correct document wrongly.  This is not the
case if it just doesn't know what children are allowed.

For example, if c were declared CDATA and the parser didn't have the
DTD, it would report a syntax error for

  <c>></c>

Various other features of SGML have been omitted for the same reason,
in particular start- and end-tag omission.  Similarly a new syntax has
been created for empty elements, because without the DTD a parser
can't tell that an element must be empty.

-- Richard

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)