XML Torture Test: Parsers Fail

Richard L. Goerwitz richard at goon.stg.brown.edu
Wed Apr 7 18:30:44 BST 1999


David Megginson wrote:

>  > I don't see anything in the spec that says "don't read and validate
>  > external parsed entities if they're not used."  And in fact, the spec
>  > seems to say that, in order to be valid, they must (whether used or not)
>  > match certain productions in the grammar.
> 
> You could check them for well-formedness (I guess), but you could not
> validate them out of context

I sympathize with this view.  But your making an implicit apology on behalf
of the spec, which actually just says:

    1) The document entity is well-formed if it matches the production
       labeled document.

    2) An external general parsed entity is well-formed if it matches
       the production labeled extParsedEnt.

    3) An external parameter entity is well-formed if it matches the
       production labeled extPE

There's no mincing words about "using" entities (in the sense of adding
an entity reference to a spot where the reference will expand).

All the spec says is that validity depends on entities matching certain
productions in the grammar.  It's a simple, static definition of how all
the entities must be structured.  It says nothing about operational ques-
tions like whether you have to wait to validate until the entity appears
in a place where it will be expanded.

> You could check them for well-formedness (I guess), but you could not
> validate them out of context                                      ^^^

Sure you could.  But obviously an external entity, in this scenario,
would come out invalid if you declared it at a point where parameter
entities it uses are not yet declared.  So just make sure you do that.
It's what the spec says, right? ;-)

Also, parsers, when they check external entities, will have to make
temporary copies of their parents' entity tables.  Why?  Because any
given external entities may define more entities that it itself uses.
(A typical case would be defining a parameter entity that later gets ex-
panded to "INCLUDE").  So we have to keep a record of what's been de-
fined.  On the other hand, if the parent entity never references the
external entity, we don't want definitions within the external entity
leaking into the parent's tables.  An exception to this is the top-
level external DTD entity, which is always "used" and whose definitions
we always want to leak back into the parent's tables.

If IE's parser interprets the spec the way it's written, it will have
to do all of these things.

I reiterate my belief that the XML standard was written with SGML prac-
tice in mind.  If you know what SGML parsers typically do in such situ-
ations, you know immediately what the XML spec editors really meant to
say.  The question of whether what they actually _did_ say will work
in practice is another matter.

STG's parser, by the way, compromises between these two approaches.
On the one hand, it does not insist that external entities validate at
the point where they are declared.  On the other hand, it still scans
the entities, whether they are used or not, and emits error messages if
it finds any obvious problems.  This seems a reasonable approach.  I'd
guess (not having tested it myself) that it's what IE is doing as well.

-- 

Richard Goerwitz
PGP key fingerprint:    C1 3E F4 23 7C 33 51 8D  3B 88 53 57 56 0D 38 A0
For more info (mail, phone, fax no.):  finger richard at goon.stg.brown.edu

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list