Encoding detection again ...

David Brownell db at Eng.Sun.COM
Wed Mar 3 19:55:48 GMT 1999


> > > Put it this way:  if you assume UTF-16, you're
> > > safe either way because UTF-16 is a superset.
> >
> > Err ... is that true?
> >
> > Maybe I'm being a bit obsessive about my
> > interpretation of the various standards docs,

Given how many folk talk about UCS-2 lately (not many!)
that could well be true ... ;-)

> >	 but
> > as far as I can see UCS-2 isn't a subset of
> > UTF-16.
> 
> The question of UCS-2 being, or not being a subset of
> UTF-16 is a bit of a red herring. It is undoubtedly true
> that the set of octet pairs which are legal UCS-2
> characters is a subset of the set of octet pairs which
> are legal UTF-16 characters.

And more to the point, XML processors aren't required
to report such low level character encoding errors ...
this would be one.

 
> Appendix F suggests that octet sequences which could
> equally well be interpreted as UTF-16 or UCS-2 may be
> assumed to be UTF-16, and *doesn't* include a clause
> stating that this assumption should be revised in
> the light of an explicit XML encoding declaration. I
> think that clause should be added, in much the same
> way as it is for UTF-8 vs. 8859-X.

All of appendix F is non-normative; you're free to revise
or not, as you see fit, and it won't affect conformance.

- Dave




> Now the typo ...
> 
> > This very complicated and isn't a zillion miles away
> > from the current handling of UTF-8 vs. ISO 8859-x
> > vs. US-ASCII.
> 
> Please insert the word 'isn't' in the obvious
> place ;-)
> 
> Cheers,
> 
> Miles
> 
> --
> Miles Sabin                          Cromwell Media
> Internet Systems Architect           5/6 Glenthorne Mews
> +44 (0)181 410 2230                  London, W6 0LJ
> msabin at cromwellmedia.co.uk           England
> 
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
> To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list