XML Torture Test: Parsers Fail

Elliotte Rusty Harold elharo at metalab.unc.edu
Wed Apr 7 16:11:08 BST 1999


>I'm not so sure that IE5 is wrong in reporting an error (when unreferenced
>General Entities are DTD chunks).  The XML REC says (in 4.3.2 "Well-Formed
>Parsed Entities")
>"An external general parsed entity is well-formed if it matches the
>production labeled extParsedEnt", which is an optional TextDecl [77]
>followed by 'content' [43].  Non-validating processors are not required to
>read external entities, but they are not forbidden to read them if they are
>not referenced.
>

I agree that IE5 can read the external entity if it feels like. However,
the document is still well-formed because the entity is never referenced
and is not part of the document. This document meets the criterion for
well-formedness in Section 2.1; i.e.

1. Taken as a whole, it matches the production labeled document.

2. It meets all the well-formedness constraints.

3. Each of the parsed entities which is referenced directly or indirectly
within the document is well-formed.


#3 is the kicker here. The non-well-formed entity that causes the problem
is never referenced.  I'm not sure what indirectly referenced means though.
Perhaps that provides some wiggle room. The only other releavnt instance of
"indirect" I see in the spec is in the No Recursion well-formedness
constraint in Section 4.1. This states that "A parsed entity must not
contain a recursive reference to
itself, either directly or indirectly"

In this context an indirect reference seems to mean one that did not occur
in the main document but that appears in one of the other external parsed
entities that was included by a different entity reference.The annotated
spec seems to support this interpretation
<http://www.xml.com/axml/notes/Recursion.html> though the example given
uses purely internal entities.

The word "indirect" also appears in these well-formedness constraints:

Well-Formedness Constraint: No External Entity References
 Attribute values cannot contain direct or indirect entity references to
external entities.

Well-Formedness Constraint: No < in Attribute Values
 The replacement text of any entity referred to directly or indirectly in
an attribute value (other than "&lt;") must not contain a <.

The annotated spec doesn't really address these two constraints in this way.
It seems remotely possible that what's really meant is an unparsed entity,
but if that's so why didn't the authors just say that?  Furthermore, an
unparsed entity has no reason not to contain these things. Again it seems
that what is mean is simply an entity reference whose value uses another
entity reference that violates the constraint.

In short, I think IE5 is definitely incorrect in not accepting a
declaration of a malformed entity in the absence of an actual reference to
that entity.


+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|        XML: Extensible Markup Language (IDG Books 1998)            |
|   http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://sunsite.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://sunsite.unc.edu/xml/     |
+----------------------------------+---------------------------------+



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list