Attribute normalisation and character entities

Arjun Ray aray at q2.net
Mon Jan 24 16:47:48 GMT 2000



On 24 Jan 2000, Richard Tobin wrote:

> Section 3.3.3 seems to me to say that character references are not
> subject to the translation to #x20 [...] 
> The errata (http://www.w3.org/XML/xml-19980210-errata) re-writes this
> section but does not appear to change it in this respect.
> 
> However the Oasis test suite, in tests sa02 and not-sa02, requires
> that they are replaced with spaces.
> 
> Which is correct?

If the intent is to do it the SGML way, then 3.3.3 is correct.  In fact, I
think 3.3.3 (as clarified in the errata) is the best explanation I've seen
of this!:-)

The SGML gotcha here has to do with the 'SEPCHAR' category.  A numeric
character reference is always character data at the point it occurs, and
so doesn't get *parsed* as SEPCHAR (and thus thereafter normalized for
non-CDATA declared values.)

Try this file with nsgmls:

===
<!DOCTYPE foo [
  <!ELEMENT foo - - (#PCDATA) >
  <!ATTLIST foo
            bar   CDATA #IMPLIED
            baz   NAMES #IMPLIED
            >
]>
<foo bar="blah1&#10;blah2" baz="grape&#10;banana">...</foo>
===

This won't validate.  So

a) Replace '&#10;'  with '&#RE;'.  Now, it will validate. (because RE is a
   SEPCHAR when parsed.)
 
b) Replace with '&lf;'  and add a declaration in the DTD

    <!ENTITY lf  "&#10;" >

  This, too, will validate (because the character reference substitution
  occurs when the entity declaration is *parsed*, and so is a regular
  literal whitespace character by the time the entity reference is used.)

c) Change the entity declaration to 

    <!ENTITY lf  CDATA "&#10;" >

and now, it won't validate any more. (because the recursive parsing rule
has been short-circuited.)

d) Repeat (b) and (c) with 'RE' for '10' in the entity declaration.  Same
difference in results. 

Ain't this fun?;)


Arjun



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list