Proposal for embedding octet-streams in XML (was: XML DTD...binary data)

Chris Olds colds at nwlink.com
Sat Jun 20 01:43:47 BST 1998


I couldn't let this go...

John Cowan wrote:
> 
> Rick Jelliffe wrote:
> 
> > The most common notation to use is Base64. You can find base 64 specified
> > in an RFC.

http://www.faqs.org/rfcs/rfc1341.html

> > You can make a more efficient encoding by using all the available
> > characters. There are sevearal thousand, so you might want to invent your
> > own Base4K  encoding, for example, if it was really a big problem.

Base64 is designed to use characters that don't change depending on which page
of ISO 646 is used, and are represented consistently in all versions of EBCDIC
as well.  If you use more than 6 bits, you lose some of these properties.

> I propose a compromise: what might be called Base-256 encoding.

[details of the encoding snipped]

> Using this convention causes the data to be expanded by 2:1 in a UCS-2
> representation, by 3:1 in a UTF-8 representation, and by 7:1 in a
> numeric-character-reference representation.  Therefore, it is suitable
> only for relatively small amounts of octet data embedded in a basically
> textual matrix.

Umm....  This is only makes sense in a UCS-2 document.
Since Base-64 expands 3 bytes into 4 (ASCII) characters, encoding such a
document in UCS-2 would effectively expand 3 bytes into 8 bytes.  In that case,
the 2:1 penalty for shifting the byte into the UCS-2 private use zone is less
than encoding Base64 in UCS-2.
However, since all of the Base64 characters are 7-bit ASCII characters, the
Base64 overhead of 4:3 is much less than the UTF-8 representation of U+F0xx, and
even the UCS-2 representation of Base64 encoding (at 8:3) in smaller then the
7:1 a (UTF-8 or ISO Latin) character entity requires.

	/cco

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list