CDATA by any other name... (was The raw and the cooked)

david at megginson.com david at megginson.com
Sat Oct 31 12:24:28 GMT 1998


Rick Jelliffe writes:

 > If you take the view that CDATA section labels the data as
 > character data (i.e. not ignorable whitespace) then <![CDATA[ ]]>
 > is clearly invalid in Henry's example: because the " " is marked as
 > data and data is not allowed.  But that is emphera: what does the
 > spec say?

 > I think the answer is clear from the spec: [43] content ::=
 > (element | CharData | Reference | CDSect | PI | Comment)* so a
 > CDSect is not CharData. Therefore a CDSect is only valid in mixed
 > content, even though it is well-formed to have it in element
 > content.

You're pumping the spec too hard.  As I've pointed out in a previous
posting, elsewhere the spec says that all non-markup-characters are
character data, and in the CDATA section clause, it refers to the
contents of the marked section as "character data," but I would attach
no more weight to these than I would to your excerpts: if the spec had
meant to make a clear statement about CDATA sections in element
content, I trust that Tim and the other editors would have made the
statement explicitly rather than hiding it for amateur exegetes like
us to dig up.

 > I think this is doubly clear from the discussion of "white-space"
 > in [XML 2.10]: white-space for xml:space considerations (in element
 > content) is space added for "greater readability". <![CDATA[ ]]>
 > does not do this!! It disrupts readability. So from the purpose of
 > valid whitespace in element content it is clear that <![CDATA ]]>
 > is not legitimate. The text is just as important as the
 > productions.

This is a useful statement of intent, but again, it does not answer
the fundamental question -- it also relies on a subjective
interpretation of readability (though one that many of us would
probably share).  Note, also, that by your reading, comments might
also be forbidden in element content.

Consider, again, the following:

  <!DOCTYPE a [
    <!ELEMENT a (b,c)>
    <!ELEMENT b EMPTY>
    <!ELEMENT c EMPTY>
  ]]>
  <a><![CDATA[  ]]><b/><c/></a>

Is a validating parser that reports an error conformant to the letter
(if not necessarily the spirit) of the XML 1.0 REC?  Yes.  

Is a validating parser that does *not* report an error conformant to
letter (if not necessarily the spirit) of the XML 1.0 REC?  Yes.

Does this need to be fixed?  YES!


All the best,


David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list