Binary Data in XML : Turning back the clock

Tyler Baker tyler at infinet.com
Wed Sep 30 16:36:13 BST 1998


Paul Prescod wrote:

> "Samuel R. Blackburn" wrote:
> >
> > A couple of weeks ago on this list, there was a thread that was
> > lamenting the slow adoption of XML in the web community.
> >
> > It seems to me that one of the first problems programmers
> > encounter is XML's inability to handle "binary" data. Once they
> > hit that wall, they drop XML and move on to something else
> > (usually a custom format).
>
> First, binary data is not a wall. It's at most a gate. There are several
> ways to handle it, none of them particularly onerous. My favourite is
> "tar".
>
> Second, recall that binary junk is what we are running away from.
> Consider:
>
> <ms:word xml:length="10000 bytes"></ms:word>
>
> Yuck! I will rue the day I crash "vi" or "more" by looking at an XML
> document.
>
> I think that it is a much better practice to have the XML document contain
> only human-readable, human-editable text and LINKS to necessarily
> non-readable stuff. I suppose I would make an exception for streaming
> processes that want to interleave tags and data: base64 handles this fine.

Base64 only increases the size of the data transmission by around 33%.  You could
in fact use something other than Base64 which has a conversion ratio of 8/7
instead of 8/6.  This should not be a major penalty considering that networks
these days are generally fast and costs are low.  Furthermore, if efficiency is
ever a question, you can either first encode your binary data into some
compressed binary format before applying base64 encoding to it, or else you could
compress the entire XML stream in the first place (what I would recommend).

One major pain in the you know what with EDI is that for BIN segments you have to
deal with the stream in 8-bit binary format while the rest of the stream could be
transmitted in 7-bit ASCII.  Also for languages which have an idea of multi-byte
character sets, I/O can be a pain (as well as inefficient) if you cannot assume
that you are working with just a character stream since you will have to in
essence need to process everything one byte at a time (are we in a character
stream or binary stream?).

For the XML Framework that I hope to release in the next week or so depending on
how long it takes to finish up on the documentation, base64 is handled natively
in both the parser and the formatter.  Someone please correct me if I am wrong,
but I think this is one service SAXON provides as well.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list