Feeler for SML (Simple Markup Language)

David Megginson david at megginson.com
Mon Nov 15 18:20:08 GMT 1999

Sean Mc Grath <digitome at iol.ie> writes:

> I would argue they do get actively in the way.
> Sure, the EDI stuff probably does not need external parsed entities
> and who knows what else. But if EDI software is going to
> truthfully "use XML" and "be fully XML compliant" then it
> cannot just barf when these things appear in XML docs.

Whoa!  Let's imagine an engineer (let's say, the Desperate Palmtop
Hacker, or DPH for short) who's implementing XML support on a small
device.  She's probably not even *looking* at the XML 1.0 grammatical
productions; instead, she's choosing a library, possibly in Java with
a SAX interface, and getting at the XML through that.

Now, this engineer cares only about elements, attributes, and data --
she doesn't need to know anything about comments, notations, external
unparsed entities, processing instructions, whitespace in element
content, or a bunch of other stuff.  So, what extra work does she have 
to do with full XML vs. a subset?

a. She has to *not* provide a DTDHandler to the SAX parser.

b. She has to *not* override the processingInstruction and
   ignorableWhitespace callbacks from HandlerBase.

c. er, that's it.

So, obviously, implementation complexity is not a legitimate argument
for a creating a subset -- there are more than 20 XML parsing
libraries available in most major programming languages, and most of
those libraries of which provide event-based interfaces (where you can
ignore stuff by default), many are free, and some are very small.

Would the space savings from SML justify splitting the market with a
second markup language?  In my experience, no -- simply parsing
comments, CDATA sections, processing instructions, and even DTD
declarations adds almost nothing to the size of an XML parser.

When you're getting close to 10K, though (as I believe we could in
Java), the text of all the required XML 1.0 error messages may
actually become larger than the parser itself -- if you simply
replaced the text messages with numeric codes, you might save 5 to 10
times as much space as you would by cutting a few minor features.

To conform to XML 1.0, you also have to build some very, very large
tables so that you can detect the required errors for Unicode
characters not allowed in names, attribute values, character data,
etc., and the data required to build these tables might also add 25%
or more to the size of a very small XML parser.  I suggest that any
efforts at putting XML on small devices concentrate on this area
rather than tinkering with the admittedly slightly-pudgy (though not
bloated) XML grammer.

All the best,


p.s. If I could go back in time, I'd be happy to argue that notations, 
     unparsed entities, and some other junk be removed from XML 1.0,
     but it's too late now, and we won anyway.

David Megginson                 david at megginson.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list