UTF-8 or ? for SML (was: Re: Feeler for SML (Simple Markup Language))
Tony Graham
tgraham at mulberrytech.com
Sat Nov 13 19:13:22 GMT 1999
At 13 Nov 1999 15:46 -0000, Richard Anderson wrote:
> But UTF-8 can support "foreign" characters so I dont see the argument for
> having UTF-16 too. Also, generally speaking UTF-8 encoding results in
> smaller output for most cases.
Different people have different ideas of what constitutes "foreign".
For the majority of the characters in the Unicode Standard, UTF-8 uses
three bytes per character. However, for the US-ASCII characters, it
uses only one byte per character.
For all characters in the Unicode Standard, UTF-16 uses two bytes per
character.
Whether a given file is less bytes as UTF-8 or UTF-16 is largely a
function of the proportion of unaccented Latin characters in the file.
Moreover, most legacy encodings for a single script use one byte per
character, although Chinese, Japanese, and Korean encodings use two or
more bytes per character. UTF-8, therefore, isn't as efficient as the
legacy encodings of most scripts. (Its advantage is that it can
represent more scripts than any legacy encoding.)
Regards,
Tony Graham
======================================================================
Tony Graham mailto:tgraham at mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list