Attribute normalisation and character entities

Thu Jan 27 14:19:55 GMT 2000

Richard Tobin wrote:

<snip/>
> The algorithm in 3.3.3 applied to a non-CDATA attribute
>
>    "foo &#x9; bar"
>
> will first replace the character reference with a tab character, so we
> will have
>
>   space tab space
>
> Then the "further processing" will do nothing, because there are no
> sequences of more than one space character.
>
> The only way that the tab could become a space is if both the first
> and third points of section 3.3.3 were applied to it, but the natural
> reading is that those points are alternatives.
>
OK I think I'm seeing what you are seeing now.  3.3.3 specifies behavior for
BOTH Character References and Entity References.  Elsewhere in the Rec a
clear distinction is made between character references and entities. The
Character Reference behavior is just to append, not to do any recursive
normalizing.  But the paragraph on replacing spaces with a single space only
applies to #x20 spaces, not to any other kind of whitespace
So the tab wouldn't get removed, as you say.

I wonder if this was an error or intended?  The text in the Rec is

"If the declared value is not CDATA, then the XML processor must further
process the normalized attribute value by discarding any leading and
trailing space (#x20) characters, and by replacing sequences of space (#x20)
characters by a single space (#x20) character."

Maybe they intended "...sequences of whitespace..." instead.

Anyone know for sure which way this was supposed to read?

Tom Passin

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.