SML: The size of the processor is not the issue

Thu Nov 18 11:33:34 GMT 1999

Sean Mc Grath <digitome at iol.ie> writes:

> It is in this space, transforming XML to XML, that the cost
> of all XML 1.0 features are paid. Take the most trivial XML
> to XML transformation -- the null transformation. Think about
> how hard it is do this for arbirtary XML 1.0 documents.

That's simply a problem of underspecification: XML 1.0 provides a
syntax, but for the most part, it doesn't say what in that in that
syntax is signal (such as character data) and what is noise (such as
whitespace in a start tag).

APIs defacto make the signal/noise distinction for you.  If you're
accessing an XML document through SAX, all you have to worry about in
the transform is the following:

1. notation decls
2. unparsed entity decls
3. elements
4. attributes and their values
5. character data
6. whitespace in element content (which you can treat as character
   data)
7. processing instructions

That's pretty good, really.  Writing an identity transform with SAX in
Java takes only a few screens of code and can be managed in 15 minutes 
by any competent, intermediate-level programmer who knows the domain
(a good coder can do it in 5).

On a more formal level, the Infoset goes to great care to make the
signal/noise distinction that's missing from XML 1.0, and Canonical
XML builds further on the Infoset.

All the best,

David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)