parsing entity values

Richard Tobin richard at cogsci.ed.ac.uk
Mon Jan 25 16:15:41 GMT 1999


> ><!ENTITY % ap "&#38;#39;" > ( 38 = "&" , 39 = "'" )
> ><!ENTITY msg "he said %ap;hi!%ap;" >

> Right. The replacement text for ap is
> 
>     &#39;

Yes.

> With msg, the parameter entity is included as part of the replacement text
> and so the replacement text of msg is
> 
>     he said &#39;hi!&#39;

No.

See the table in section 4.4.  We have a parameter entity reference in an
entity value, so it is "included in literal".  4.4.5 says "[the parameter
entity's] replacement text is processed in place of the reference itself
as though it were part of the document at the location the reference was
recognised, except that a single or double quote character [...] will not
terminate the literal".  So the &#39; is processed as if it had occurred
directly in the definition of msg.

You can't see the difference in this case, but if we had:

<!ENTITY % less "&#38;#60;">
<!ENTITY % more "&#38;#62;">
<!ENTITY elt "%less;=%more;">

the replacement text of elt would be 

  <=>

not

  &#60;=&#62;

and should be detected as a syntax error if &elt; occurred in the
body.

Phil suggests that having to keep track of where the quotes are
special makes the parsing quite difficult; I don't think this is true,
though perhaps it depends on how your parser works.  Mine just checks
to see whether it read the quote character from the same entity that it
read the opening quote from.

-- Richard

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list