Binary Data in XML

Tyler Baker tyler at infinet.com
Wed Sep 30 18:46:36 BST 1998


Tim Bray wrote:

> Suppose I wrote up a NOTE, should occupy less than one page, proposing
> a reserved attribute xml:packed with, for the moment, only two
> allowed values, "none" and "base64".  The default value is "none".
> If an element has xml:packed="base64" this means that
>
> (a) the content of the element to which this is attached must be
>     pure #PCDATA, no child elements and no references, and
> (b) the content is encoded in base64, leading and trailing spaces allowed

Why would the content have to have no child elements or references?  I can see
how this constraint would make things simpler for the parser and avoid some of
the obvious confusion with mixed content models.

> This obviously couldn't retroactively become part of XML 1.0, but
> if it went through a process and became a W3C recommendation, I bet
> every parser author in the world would support it in about 15 minutes.
>
> Base64 (a 4-for-3 encoding) wastes 33%, so I thought about perhaps
> inventing Base128 (8-for-7) or maybe even a higher level to cut down
> wasteage, but Base64 has the advantage that it avoids UTF8/ISO-8859
> confusion and I bet Mr. LZW will eat that 33% anyhow...

Something I still wonder about is whether UNISYS is still playing patent rights
games with LZW.

> I also thought about xml:encoding=, but that conflicts with
> encoding= in the XML declaration in a confusing way.

You could have something like xml:type="base64" which opens the door for more
efficient processing by the XML parser for data primitives before presenting them
to the application.  So you could have something like:

xml:type="int".  It would be up to the parser to marshal the data in big-endian,
little-endian or whatever format the host operating system uses.  Right now, most
parsers and parser interfaces provide attribute values and character content only
in String form.  If Mathematica were to do anything useful with XML and a Java
interface (with a lower-level native kernel of course) it would first have to
parse the data as a String.  This is inefficient as Mathematica would only care
to have the data as a number in the first place.  If the character content cannot
be parsed into a number, throw an exception or return an error code.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list