SAX, non-XML Documents, and Legal Characters

Don Park donpark at
Mon Apr 13 20:42:43 BST 1998

>The question is, of course, moot for XML parsers, since they will have
>to report a fatal error anyway if they find non-XML characters.  It
>would be interesting, though, to build an RTF parser with a SAX driver
>and then hook it up to Don Park's SAXDOM.


IMHO, power of XML lies in it being the hub (some would say bottleneck).  I
think it would be far more flexible to have converters that translates LaTex
and RTF into XML which can then be processed by any SAX parser.

Ideal processing of legacy documents in the XML realm involves four phases:

1. Conversion phase converts legacy documents into XML documents with
emphasis on loss-less capturing of the original information.  Little
emphasis is placed on how information will be used.  This step will be done
typically by content owner.

2. Distillation phase extracts useful components of XML documents into one
or more ready-for-processing XML documents with emphasis on providing the
most useful and flexible form of information.  This step is typically done
by value added information vendors as well as content owners.

3. Distribution phase involves transmitting processed XML documents to the
clients with the most emphasis placed on catering to the consumption phase.
This step is done by the application servers.

4. Consumption phase involves client software converting XML documents into
consumable formats such as HTML, RTF, LaTex, etc.  The emphasis in the
consumption phase is on user preference.  This step is done by the client

So, my vote is no.


Don Park

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list