Attribute normalisation and character entities

David Brownell david-b at pacbell.net
Thu Jan 27 23:56:54 GMT 2000


Richard Tobin wrote:
> 
> In article <3890CE2A.633285 at pacbell.net>,
> David Brownell <david-b at pacbell.net> wrote:
> 
> >There are two curious points in 3.3.3 ... first, that character and
> >entity refs may appear, and second that CRLF sequences may appear (line
> >endings already having been normalized).
> 
> What makes you sure line ends have already been normalised?

http://www.w3.org/XML/xml-19980210-errata#E24 ... the first sentence
that replaces 3.3.3 in the REC says so.


>	  In 2.11
> it refers to converting them to #xA before passing them to the
> application, and suggests that it can be implemented by normalising
> before parsing (but doesn't have to be).
> 
> I take the line-end conversion in 3.3.3 as duplicating the requirement
> in 2.11.  If you implement it by normalising before parsing, you won't
> have to do anything about it in attribute normalisation.

The errata preclude that interpretation.  Line end normalization is done
first, and yet afterwards you can still find a CRLF (or a plain CR) in the
pre-normalization attribute text.


> Similarly, I think the entity expansion in 3.3.3 is duplication of
> 4.4.

That was one of the options I presented.  Along with some of the
spec inconsistencies introduced by that interpretation.


> And finally, I suspect that the authors just forgot the possibility
> of non-#x20 whitespace (arising from character entity references) in
> the paragraph about trimming and compressing spaces.

Didn't I identify a few more problems with 3.3.3 than that??  ;-)


> The simplest solution seems to me to leave normalisation as it is, and
> change the Names and Nmtokens productions (which are only used for
> tokenised attribute) to require #x20 rather than S.  This would make
> "foo&#9;bar" illegal as a tokenised attribute, and a good thing too.

That'd only affect a couple validity constraints, and wouldn't address
the problem that the spec is problematic re multiple aspects of the
attribute normalization.  (I'll refrain from proposing a fix though!)

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list