Attribute normalisation and character entities

Richard Tobin richard at cogsci.ed.ac.uk
Thu Jan 27 23:27:04 GMT 2000


In article <3890CE2A.633285 at pacbell.net>,
David Brownell <david-b at pacbell.net> wrote:

>There are two curious points in 3.3.3 ... first, that character and
>entity refs may appear, and second that CRLF sequences may appear (line
>endings already having been normalized).

What makes you sure line ends have already been normalised?  In 2.11
it refers to converting them to #xA before passing them to the
application, and suggests that it can be implemented by normalising
before parsing (but doesn't have to be).

I take the line-end conversion in 3.3.3 as duplicating the requirement
in 2.11.  If you implement it by normalising before parsing, you won't
have to do anything about it in attribute normalisation.

Similarly, I think the entity expansion in 3.3.3 is duplication of
4.4.

And finally, I suspect that the authors just forgot the possibility
of non-#x20 whitespace (arising from character entity references) in
the paragraph about trimming and compressing spaces.

The simplest solution seems to me to leave normalisation as it is, and
change the Names and Nmtokens productions (which are only used for
tokenised attribute) to require #x20 rather than S.  This would make
"foo&#9;bar" illegal as a tokenised attribute, and a good thing too.

-- Richard
-- 
Spam filter: to mail me from a .com/.net site, put my surname in the headers.

"The Internet is really just a series of bottlenecks joined by high
speed networks." - Sam Wilson

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list