SAX needs from our point of view
David Megginson
ak117 at freenet.carleton.ca
Fri Apr 24 03:06:38 BST 1998
Michael Amster writes:
> In our case, having embedded XML languages with our own language
> controlling flow of execution, we have a real need for an accurate
> reproduction of the XML elements parsed so they can be rewritten
> correctly.
SAX reports all elements, together with character data, ignorable
whitespace, and processing instructions, so you won't lose anything
there.
> Specifically, the issue is important in distinguishing between text and
> CDATA. Let me illustrate with a simple example:
>
> <WEIF COND="true">
> <WETHEN>
> <ARBITRARYXML/>
> <![CDATA[
> This is data with &references; which should not be parsed!
> ]]>
> <MOREXML>
> This is just text
> </MOREXML>
> </WETHEN>
> </WEIF>
>
> When this is reported up from a SAX parser, we do not differentiate between
> text and the CDATA, but let's say that we want to output the subset of
> arbitrary XML back out from our DOM or other object structure:
>
> <ARBITRARYXML/>
> This is data with &references; which should not be parsed!
> <MOREXML>
> This is just text
> </MOREXML>
Your output routine is wrong: it should automatically escape all
instances of '&', '<', and '>':
<ARBITRARYXML/>
This is data with &references; which should not be parsed!
<MOREXML>
This is just text
</MOREXML>
or even
<ARBITRARYXML/>
This is data with &references; which should not be parsed!
<MOREXML>
This is just text
</MOREXML>
> Now you see that the CDATA will have all references made when it is
> reparsed. We really do want to preserve CDATA as different from
> text in SAX.
If there's a semantic attached to your use of CDATA, you should
represent it with an element (which is guaranteed to make it through
processing):
<listing><![CDATA[
Here is a listing: 1 < 2
]]></listing>
<listing>
Here is a listing: 1 < 2
</listing>
There is no need for general XML processing tools _ever_ to know about
CDATA sections; authoring and repository tools (including tools for
authoring transforms) might want preserve them, but those fall out of
the target audience for SAX level 1.
Think of the analogy of C: the preprocessor takes care of surface
things like macros and hides them from the compiler, which produces
exactly the same object code for
#define FOO 1
printf("%d", FOO + FOO);
and
printf("%d", 1 + 1);
All the best, and thanks for the comments,
David
--
David Megginson ak117 at freenet.carleton.ca
Microstar Software Ltd. dmeggins at microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list