Encoding detection again ...

Miles Sabin msabin at cromwellmedia.co.uk
Wed Mar 3 12:45:45 GMT 1999


Sorry to follow up my own posting, but one thing needs a 
bit of clarification, and one typo needs correction.

I wrote,
> David Brownell wrote,
> > Put it this way:  if you assume UTF-16, you're
> > safe either way because UTF-16 is a superset.
> 
> Err ... is that true?
> 
> Maybe I'm being a bit obsessive about my 
> interpretation of the various standards docs, but 
> as far as I can see UCS-2 isn't a subset of
> UTF-16.

The question of UCS-2 being, or not being a subset of 
UTF-16 is a bit of a red herring. It is undoubtedly true 
that the set of octet pairs which are legal UCS-2 
characters is a subset of the set of octet pairs which 
are legal UTF-16 characters.

Appendix F suggests that octet sequences which could
equally well be interpreted as UTF-16 or UCS-2 may be 
assumed to be UTF-16, and *doesn't* include a clause
stating that this assumption should be revised in
the light of an explicit XML encoding declaration. I
think that clause should be added, in much the same
way as it is for UTF-8 vs. 8859-X.

Now the typo ...

> This very complicated and isn't a zillion miles away 
> from the current handling of UTF-8 vs. ISO 8859-x 
> vs. US-ASCII.

Please insert the word 'isn't' in the obvious
place ;-)

Cheers,


Miles

-- 
Miles Sabin                          Cromwell Media
Internet Systems Architect           5/6 Glenthorne Mews
+44 (0)181 410 2230                  London, W6 0LJ
msabin at cromwellmedia.co.uk           England


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list