parsers for Palm?

Marc.McDonald at Design-Intelligence.com Marc.McDonald at Design-Intelligence.com
Tue Jun 29 03:10:53 BST 1999


I agree that validating is not a sizable increase to the size of an XML
parser. I wrote a validating parser and the amount of extra code is probably
more in the 15% range than the 50% range for DTD parsing and attendant
indirect effects.

The parsing of DTD syntax is not large at all. Attribute defaulting and
normalization is also not much code. Entity expansion determines an
architecture for streams in the parser, but not much code. Element pattern
validation is about 1000 lines of C++, but I implemented an n-element
backtracking pattern matcher. It can handle declarations such as:
	<!ELEMENT X ( (A, B, C, D, E) | (Q?, A, B, C, D, F))>
The spec only requires 1 element lookahead so it was overkill (the specific
clause said 'may' so I did).

I also agree that a better description of some XML features, entity
expansion in particular, would make implementation easier (in the sense of
clearly knowing what is required not the amount of code).


Marc B McDonald
Principal Software Scientist
Design Intelligence, Inc
www.design-intelligence.com <http://www.design-intelligence.com> 


	----------
	From:  Tim Bray [SMTP:tbray at textuality.com]
	Sent:  Monday, June 28, 1999 4:36 PM
	To:  David Brownell
	Cc:  XML-Dev Mailing list
	Subject:  Re: parsers for Palm?

	At 04:27 PM 6/28/99 -0700, David Brownell wrote:
	>> Distinguish between <!DOCTYPE > and validation.  I do *not* agree
	>> that parsing DTD syntax takes up 2/3 of a parser.
	>
	>I did distinguish between them. 
	...
	>Savings may not be 2/3 ... but I'd be _really_ surprised if they
were
	>less than 1/2.  The best way to know is to implement ... :-)

	I did.

	>But they can become a LOT
	>smaller if they don't need to handle even that, and are relieved of
	>the responsibilities to handle the syntax and state in a DOCTYPE.

	I disagree.  We went through this quite a bit in the XML Syntax
Working 
	Group.  It is absolutely *not* the case that DTD parsing is
demonstrably
	very expensive.  There was a conventional wisdom floating about that
	a parser for a DTD-free dialect of XML could deliver the same
performance
	and functionality in immensely less space.  Empirical analysis fails

	to support this contention.  Analysis of existing parsers shows
immense 
	amounts of work going into things like reading Unicode efficiently,
doing 
	well-formedness checks on entity nesting, and tracking locations to
	support good error messages - I repeat that there is a resounding
lack
	of evidence to show that parsing DTD syntax is particularly taxing
for
	any competent programmer.  Even parameter entities aren't hard to
	implement - they are hard to *describe*, just not hard to implement.
-Tim

	xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev at ic.ac.uk
	Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
CD-ROM/ISBN 981-02-3594-1
	To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
	(un)subscribe xml-dev
	To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
message;
	subscribe xml-dev-digest
	List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list