CDATA by any other name... (was The raw and the cooked)

John Cowan cowan at locke.ccil.org
Tue Nov 3 16:38:38 GMT 1998


Rick Jelliffe wrote:

> A CDATA marked section is not only a way to prevent delimiter recognition.
> It is also a way to declare that the characters in that section are limited
> to ones available in the direct document encoding of the originating system.

True.  However, since the standard encodings of XML include all the
characters there are (and if they don't include yours, just you
wait, 'Enry 'Iggins), that isn't as much of an issue.

> (SGML has a CDATA keyword you can use instead of content models: XML was
> felt not to need it because you could use <![CDATA[, however that perhaps
> shows the mind of the XML WG at that time, in that they were down-playing
> the need for schemas.)

CDATA elements are eeeeeevil.  They terminate at any ETAGO followed by
a name-start character, and they make it impossible to change your
mind later, if you decide you need an entity or two.  See the excellent
articles at.  They were rightly discarded from XML.

> For example, I cannot see why a smart editor could not use the CDATA section
> to cofine editing to whatever the repertoire of the character set of the
> encoding attribute of the XML header says.

IMHO, a *smart* editor would realize that a CDATA section cannot cope,
and would terminate it around the problem character.  For example,
an attempt to insert a dagger (U+2020) into a CDATA section within
an 8859-1 document would produce this:

	... ]]>&#x2020;<![CDATA[ ...

This specific case comes up when people using code page 1252 try to exploit
the non-8859-1 characters it supplies.

> In the case of editing the XML
> specification, for example, when there is a CDATA marked section being
> edited, and the editor types "<", a smart section should know not to replace
> it with "&lt;" or expect it to be a STAGO.

XED indeed has this property, although it just feeps if you attempt
to type a character that would cause "]]>" to appear in a CDATA
section, rather than splitting the section (which admittedly would
be painful to undo after a Backspace character).

> It would be nice
> if W3C allowed this, but the less that a PI can be treated (by XLL or DOM or
> SAX or whatever) as a kind of element,

The current XPointer draft allows PIs to be referred to on equal terms with
elements (except for not having a GI or attributes or sub-elements).

The DOM has a ProcessingInstruction node, though pseudo-attribute parsing
is not performed.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list