Feeler for SML (Simple Markup Language)

Thu Nov 11 22:21:53 GMT 1999

David Megginson wrote:

> "Hunter, David" <dhunter at Mobility.com> writes:
>
> > Perhaps from the point of view of a parser writer, this might be a
> > good thing.  If you knew you were never going to need these
> > constructs, you could build a smaller, faster parser.
>
> Not much smaller, I'm afraid -- for an event-based parser, support for
> PIs and attributes adds almost no overhead (I remember experimenting
> with putting them in and leaving them out when I was writing
> AElfred).
>
> AElfred, by the way, was under 15K in a compressed JAR file when I was
> maintaining it, though it wasn't strictly conformant (it didn't report
> all required errors) -- I still believe that someone could write a
> Java-based XML parser in under 10K (compressed) if they had the time
> and inclination and made more use of the standard Java libraries.

For Aelfred's case, it worked well for applets but would not work as well for cell phones, or
PDA's because what really counts is memory usage in these environments and regardless of
whether you use a java.util.Hashtable or your own custom version, a hashtable class and any
supporting utility classes will be loaded into memory one way or another. In this case,
writing your own smaller footprint hashtable would make more sense, so long as none of the
rest of your code made calls to libraries which loaded up a java.util.Hashtable into memory.
But since all kinds of core Java libraries use java.util.Hashtables all over the place, you
are probably better off just using java.util.Hashtable anyways.

This is one of the problems with embedded Java as I understand it because you still need to
bundle a whole bunch of unnecessary libraries with your application instead of just being
allowed to use the bare essentials that you really need.

Most of Aelfred's footprint from what I remember seemed to be character handling code and not
actual XML parsing code. If you restrict XML to be one character encoding such as UTF-8, get
rid of DTD handling (in the XML parser I have written more of my code is for parsing DTD's
than the actual parsing of an XML file) and validation, then I would not be surprised if you
could get things under 5K if you really wanted to.

I think for environments with severe memory constraints, some of Don's ideas really make
sense. Removing comments, pi's, and attributes though does not shrink your parser much as
handling each of those is only a few lines of code. Dealing with character encoding, DTD's,
and validation is where most of the bloat in an XML parser tends to go.

Tyler

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)