Non-Validating XML Parsers: Requirements

Chris Hubick maillist at chris.hubick.com
Mon Aug 3 23:07:15 BST 1998


On Mon, 3 Aug 1998, John Cowan wrote:

> > This means that if you supply an instance value for a FIXED attribute,
> > where that instance differs from the declared fixed value, that an NVP
> > MAY (if it supports the Fixed Attribute Default VC) or MAY NOT supply the
> > correct declared value for this attribute.
> 
> That's not clear.  It is an error for the document to supply a value
> other than the FIXED one, so the parser may return the FIXED value,
> or the application's value, or make demons fly out of your nose.
> (See previous posting, or comp.std.c++).

	Yes, it should throw an error if it understands the
validity constraint, but if it doesn't the behaviour is undeterminded.

> > <!DOCTYPE Test [
> >   <!ENTITY % xx SYSTEM 'file.ent'>
> >   <!ENTITY yy '2 %xx; 3'>
> >   <!ENTITY zz '1 &yy; 4'>
> > ]>
> > <Test>A &yy; B &zz; C</Test>
> 
> This document is not WF, and every parser should detect it (but
> some do not), to wit:  parameter entity references in the internal
> subset can only come between declarations, not within one.
> See clause 2.8, the WF constraint called "PEs in Internal Subset".

	I thought this document was well formed, I read "PEs in Internal
Subset" to mean that you can't have stuff like:

<!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>

but you can have:

<!DOCTYPE test [
<!ENTITY % xx '<!--example-->'>
%xx;
]>

because the grammer doesn't allow the PEReferences where they occur in the
first example (within declarations):

 [48]  cp ::=  (Name | choice | seq) ('?' | '*' | '+')? 

but it does in the second (where declarations occur):

[28]  doctypedecl ::=  '<!DOCTYPE' S Name (S ExternalID)? S? ('['
(markupdecl | PEReference | S)* ']'  

but looking at the EntityValue production:

[9]  EntityValue ::=  '"' ([^%&"] | PEReference | Reference)* '"'  
   |  "'" ([^%&'] | PEReference | Reference)* "'" 

It allows a PEReference in an entity value, and thus I thought it was well
formed.  If it isn't, and PEReferences are only allowed in an EntityValue 
in the external subset (as XP would suggest), then I have no idea how to
interpret the occurence of PEREferences in the grammer in things like
EntityValue, but not in Cp.  This would mean that the grammer has PE
references in places that are not allowed in the internal subset,
suggesting those are only valid in the external subset, yet the
grammer leaves them out in many places where they are allowed in the
external subset!
	Unless I am right, and you and XP are wrong (doubtfull), I would
like to officially have a rant/throw a fit over this!  I have spent
zillions of hours writing a grammer based parser for a grammer that sucks
dead bunnies through bent staws.  No sir, I don't like it, I used to
think my lack of understanding was what made my failing to understand the
spec and PEReferences fail, now I just think the spec is bad. I want
someone who knows to give me a proper BNF grammer for the internal subset
and a proper one for the external subset!  This should have been
included in the spec.  Hell, I even attempted this back in May
(http://www.lists.ic.ac.uk/hypermail/xml-dev/9805/0085.html) for the
external subset.  I quit bothering because I came to the conclusion that
the supplied grammer was for the internal subset, and knew I didn't have
to worry about the external in the first round because I didn't want to
validate.

---
Chris Hubick
mailto:chris at hubick.com
http://www.hubick.com/






xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list