Is there anyone working on a binary version of XML?

Stephen D. Williams sdw at lig.net
Fri Mar 26 18:37:04 GMT 1999



Jonathan Borden wrote:

> Tim Bray wrote:
> >
> > At 08:54 PM 3/25/99 +0000, Dan Brickley wrote:
> > >Quite so. But there are still initiatives such as
> > >
> > >     http://www.wapforum.org/docs/technical.htm
> > >     http://www.wapforum.org/docs/technical1.1/WBXML-03-Feb-1999.pdf
> >
> > I read some of it, and if you buy the idea that a binary form of XML
> > is useful, it seems quite sensible.  I'm agnostic; if they think they
> > need it who are we to tell them they don't?  Obviously it has to
> > round-trip with plain ole XML. -T.
> >
>
>         I think what this really is, when you strip out the concept of binary XML,
> is a suggestion for a compression format tuned for markup streams.
>
>         There are two distinct issues 1) efficiency of parsing  2) compactness. A
> standard compression format for XML (ala zip,gzip etc) would be for
> bandwidth limited applications.

I agree.  I feel they can be solved with a similar solution in at least some circumstances.
Rather there are some straightforward ways to acheive compression that actually make
efficiency worse while some solutions for efficiency also make compression easier.

In fact there are a number of levels you could go with compression:

optional gzip/bzip2 possibly preceded by:

Dictionary compression (various forms of building a list of commonly used terms or all terms
in the current document/stream or some combination)

'Priming' for certain circumstances.  For instance, I've long thought that an ideal design for
super high bandwidth circuits (TCP connection, message queue, special purpose) is to
essentially start out with a raw state where you send, once per connection/conversation, all
of the XML or other full self describing data (a DTD is an expression of this) and possibly
even a dictionary built from past experience and then highly compress the rest of the stream
based on the defined base.  In some circumstances you could even have a base 'dictionary'
stored on each receiver to improve short messages.

Each further transaction could use all of the known information to compress in a layered way.

There are plenty of circumstances where a connection is made and many messages are sent,
sometimes millions per connection.  I've had servers that normally handled 30-50 million
messages/day.

Both careful structuring of the data (a la bXML) and things like parallel inheritance delta's
play into this kind of optimization.

sdw

> Jonathan Borden
> http://jabr.ne.mediaone.net
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
> To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

--
OptimaLogic - Finding Optimal Solutions     Web/Crypto/OO/Unix/Comm/Video/DBMS
sdw at lig.net   Stephen D. Williams  Senior Consultant/Architect   http://sdw.st
43392 Wayside Cir,Ashburn,VA 20147-4622 703-724-0118W 703-995-0407Fax 5Jan1999



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list