parsers for Palm?

David Brownell david-b at
Tue Jun 29 01:25:49 BST 1999

Tim Bray wrote:
> At 01:57 PM 6/28/99 -0700, David Brownell wrote:
> >... what do folk think of using the following XML subset:
> >
> >       Everything in XML, except the <!DOCTYPE ...> support
> >       which takes up something like 2/3 of most parsers.
> Distinguish between <!DOCTYPE > and validation.  I do *not* agree
> that parsing DTD syntax takes up 2/3 of a parser.

I did distinguish between them.  In part that's why I described
this as a (potential) subset; it's more than just use of a
nonvalidating parser, which is an option I assume everyone on
the XML-DEV list understands.

Savings may not be 2/3 ... but I'd be _really_ surprised if they were
less than 1/2.  The best way to know is to implement ... :-)

Meanwhile, consider that:

	- The most complex syntax (content models, ATTLIST, and
	  other declarations) is in the DTD exclusively.

	- State related to those constructs needs to be managed and
	  used even when not validating (given an internal subset),
	  such as performing mandatory attribute normalizations.
	  and (recursively) including internal entities.

	- Entities are declared in the DTD (except for builtins)
	  and there's a fair bit of code involved in handling them
	  even if you don't include external entities.

	- Every functionality taken out means it's possible to take
	  out the associated error handling and reporting, and often
	  to straighen out code paths.  Such savings can be surprisingly
	  large; such handling often more than doubles code size.

	- There are a lot of efforts under way that either don't
	  require DTDs, or which stumble over them.

	- Applications would have a lot less low-level variation to
	  deal with, and higher levels would have a cleaner slate.

The savings are, in short, indirect as well as direct.

>	On the other hand,
> it's reasonable to expect a validating parser to be twice the size
> of a non-validating one.

Last time I measured, it was more like 15% ... Validation, done right,
is mostly a bunch of carefully placed tests, monitoring a content model
state machine, and tracking IDs.  (Try rebuilding Sun's parser without
the validation support -- there's a "static boolean" constant that
removes the tests, and then there are some classes that can go away.)

Of course, that 15% compares a validating parser against a nonvalidating
one which processed all the external entities ... as most do, since that
is the best way to get a portable application model processing.

>	  Note that nearly all the existing
> validating parsers parse DTD syntax just fine. -T.

I suspect you meant to say "nonvalidating" there ... :-)

Of course they do -- that's a requirement of being able to parse a
<!DOCTYPE ...> with an internal subset.  But they can become a LOT
smaller if they don't need to handle even that, and are relieved of
the responsibilities to handle the syntax and state in a DOCTYPE.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list