Attribute normalisation and character entities

David Brownell david-b at pacbell.net
Thu Jan 27 23:01:04 GMT 2000


Richard Tobin wrote:
> 
> How is an attribute containing a character reference to to whitespace
> character (other than space) supposed to be normalised?
> 
> Section 3.3.3 seems to me to say that character references are not
> subject to the translation to #x20 - the four bulleted points are
> an exhaustive disjunction.
>
> However the Oasis test suite, in tests sa02 and not-sa02, requires
> that they are replaced with spaces.
> 
> Which is correct?

As a data point, those output tests were originally generated using
the then-current version of XP.  I suspect Tom Passim's observation
is close:  except for CDATA, _whitespace_ should be replaced with just
one space.

As I've commented elsewhere, I find that much of the entity processing
in the XML spec seems to be specified as a collection of special cases
(updated via errata as inconsistencies turn up) rather than being based
on simple and consistent rules.  This is another place that it seems to
be happening.

There are two curious points in 3.3.3 ... first, that character and
entity refs may appear, and second that CRLF sequences may appear (line
endings already having been normalized).

How would these appear?  If we assume that 4.4 applies first, then
those OASIS cases are correct, and they'd appear "doubly escaped" as:

   <element
	char-ref-attr = "foo &#38;#9; bar"
	ent-ref-attr1 = "AT&#38;amp;T"
	ent-ref-attr2 = "AT&amp;amp;T"
	crlf-attr     = "a&#xD;&#xA;b"
	/>

If we assume that 3.3.3 has needless duplication of 4.4 then I
can't see how the literal CRLF can ever show up as input to the
normalization, since line-ends have already been normalized.

On the other hand, I don't think anyone actually writes what
ent-ref-attr2 has -- "AT&amp;T" is it.  Perhaps 4.4 applies
first, _and_ there is needless duplication (for entity refs).
Or 3.3.3 has both duplication and several errors.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list