Attribute normalisation and character entities
David Brownell
david-b at pacbell.net
Thu Jan 27 23:56:54 GMT 2000
Richard Tobin wrote:
>
> In article <3890CE2A.633285 at pacbell.net>,
> David Brownell <david-b at pacbell.net> wrote:
>
> >There are two curious points in 3.3.3 ... first, that character and
> >entity refs may appear, and second that CRLF sequences may appear (line
> >endings already having been normalized).
>
> What makes you sure line ends have already been normalised?
http://www.w3.org/XML/xml-19980210-errata#E24 ... the first sentence
that replaces 3.3.3 in the REC says so.
> In 2.11
> it refers to converting them to #xA before passing them to the
> application, and suggests that it can be implemented by normalising
> before parsing (but doesn't have to be).
>
> I take the line-end conversion in 3.3.3 as duplicating the requirement
> in 2.11. If you implement it by normalising before parsing, you won't
> have to do anything about it in attribute normalisation.
The errata preclude that interpretation. Line end normalization is done
first, and yet afterwards you can still find a CRLF (or a plain CR) in the
pre-normalization attribute text.
> Similarly, I think the entity expansion in 3.3.3 is duplication of
> 4.4.
That was one of the options I presented. Along with some of the
spec inconsistencies introduced by that interpretation.
> And finally, I suspect that the authors just forgot the possibility
> of non-#x20 whitespace (arising from character entity references) in
> the paragraph about trimming and compressing spaces.
Didn't I identify a few more problems with 3.3.3 than that?? ;-)
> The simplest solution seems to me to leave normalisation as it is, and
> change the Names and Nmtokens productions (which are only used for
> tokenised attribute) to require #x20 rather than S. This would make
> "foo	bar" illegal as a tokenised attribute, and a good thing too.
That'd only affect a couple validity constraints, and wouldn't address
the problem that the spec is problematic re multiple aspects of the
attribute normalization. (I'll refrain from proposing a fix though!)
- Dave
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address
Please note: New list subscriptions now closed in preparation for transfer to OASIS.
More information about the Xml-dev
mailing list