SAX needs from our point of view

Tyler Baker tyler at infinet.com
Fri Apr 24 03:22:41 BST 1998


Michael Amster wrote:

> Quoting Ray Cromwell:
>
> >Ok, now that I've started a flame war and gotten that off my chest :),
> >I'd like to nominate the three biggest features I'd like in SAX Level 2
> >(or SAX2.0), in order of importance.
> >1) access to DTD information
> >2) comments, CDATA, and location information for Attributes
> >3) sax.util classes that take an ElementFactory (which return DOM
> >interfaces), and build a tree. (maybe Don Park would like to contribute
> >this). IBM's XML for Java is a starting point, but it has the fatal flaw
> >that the return values of the ElementFactory are not the DOM interfaces
> >(such as Element or PI) but IBM base classes, like TXElement or PI,
> >which means you are forced to inherit from TXElement instead of just
> >implementing Element.
>
> In our case, having embedded XML languages with our own language
> controlling flow of execution, we have a real need for an accurate
> reproduction of the XML elements parsed so they can be rewritten correctly.
>  Specifically, the issue is important in distinguishing between text and
> CDATA.  Let me illustrate with a simple example:
>
> <WEIF COND="true">
>         <WETHEN>
>                 <ARBITRARYXML/>
>                 <![CDATA[
>                         This is data with &references; which should not be parsed!
>                 ]]>
>                 <MOREXML>
>                         This is just text
>                 </MOREXML>
>         </WETHEN>
> </WEIF>
>
> When this is reported up from a SAX parser, we do not differentiate between
> text and the CDATA, but let's say that we want to output the subset of
> arbitrary XML back out from our DOM or other object structure:
>
>                 <ARBITRARYXML/>
>                         This is data with &references; which should not be parsed!
>                 <MOREXML>
>                         This is just text
>                 </MOREXML>
>
> Now you see that the CDATA will have all references made when it is
> reparsed.  We really do want to preserve CDATA as different from text in
> SAX.  I can live without comments and to some degree, I can even reduce the
> amount of DTD info available to me, but I hope that CDATA and text are
> reported differently through the interface.  It should not substantially
> complicate things for parser writers or application developers if it is
> just a Document handler event.
>
> -MA

The solution I have found for the XMLReader (formatter) I have been working on is to
scan each string of character content for any characters that need to be escaped with
a CDATA section and embed that content in a CDATA section.  This operation
algorithmically is sort of expensive, but for the content I have had to format, the
formatting process is still 5-10 times faster than the parsing process.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list