Validating Entities (was Re: XML Torture Test: Parsers Fail)

Richard L. Goerwitz richard at
Wed Apr 7 21:17:32 BST 1999

David Megginson wrote:

>   3.Each of the parsed entities which is referenced directly or
>     indirectly within the document is well-formed

If I've seemed harsh, then forgive me.  I have a great deal of respect
for your views, and I don't think you're wrong here per se.

While I agree with what you've inferred about the standard, I'm not at
all certain that the standard itself forces your interpretation.  In the
above case, for example, the standard is talking about well-formed docu-
ments as if all parsed entities must be read in if used in the document.
In fact, this is not a requirement.  The whole reason parameter entities,
e.g., are not supposed to be used inside markup in the internal DTD sub-
set is that this allows us to bypass them if you're not validating.

(Incidentally, does it bother anyone else that you can have valid docu-
ments that aren't well-formed?  Imagine an external entity used inside
an attribute value?  If declared in such a way that a non-validating
parser doesn't realize it's external, then the validating parser will
reject it as an error (can't have external entities in this context).
There are other such cases, although this is the main one that comes
to mind.)

My general point is that the question of what you do while validating is
not simply a superset of what you do when just parsing with well-formed-
ness in mind.  You process documents in somewhat different ways depending
on which of these two alternatives you've chosen.  And so the question
of what context an external entity should be checked in, if validating,
is not clearly answered from the spec without exegesis, and I would ar-
gue, background knowledge.

Anyway, even if I grant that it says what you want it to, then the point
should still be made that it does so in a way that's not easy to interpret
or understand.  The fact that the writers of IE's parser apparently got it
wrong is therefore not at all unexpected.


Richard Goerwitz
PGP key fingerprint:    C1 3E F4 23 7C 33 51 8D  3B 88 53 57 56 0D 38 A0
For more info (mail, phone, fax no.):  finger richard at

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list