Is there anyone working on a binary version of XML?
Stephen D. Williams
sdw at lig.net
Fri Mar 26 02:16:37 GMT 1999
Imagine that you have all the features of XML: structure, flexibility, common format for
interchange, but that you perform zero processing steps to import or export the 'document'
from a program. (Actually, I'm thinking this would be done in chunks, but essentially very
few reads and writes.)
Also imagine that when taking an XML 'document' into a program you could search, modify, or
copy the object without generating thousands of object creates and deletes/garbage collection
hits. I call this last problem a 'malloc storm' and it appears to be one of the worst
problems with a lot of large Java systems. (I've experienced this problem in C++ programs for
years and Java for the last year.) Among other things, I'm directly addressing this issue.
Then imagine you can write or communicate the object to other systems simply with IO
operations with no processing involved. Then imagine that the IO is async and very cheap and
that you are processing thousands of transactions per second, most of which generate
fundamentally little processing steps.
I am and will be, necessarily, revealing some very hard won lessons in optimizing very large
systems as part of my design for this. I just feel strongly that this step is inevitable at
some point and I want to have the most useful form of it become standard. As I mentioned, it
should work particularly well with Java's available capabilities, but should be easily usable
by C/C++, etc.
There are several things that could be optimized here. CPU in application processing, CPU in
overhead, CPU in preparation for IO, size of data in memory, size of data in storage, size of
data in transit, etc.
This method would primary allow a drastic decrease in CPU for most situations and a slight
decrease in storage with an easy path to more comprehensive compression levels.
I'm going to be studying the existing binary effort and then releasing a few notes on details
of what I'm thinking. I'll try to get a Java prototype working soon. It appears that the
best path is to use SAX to generate bXML that will have either a SAX or DOM interface.
Note that the payload data in bXML would still be the same character data that would be in
character areas of a normal XML document (possibly without canonicalizing translations). When
mentioning 'binary', I simply meant that the structure would be represented by 'binary' data
structures of where to find elements, etc. In fact it's possible to do this all in
ascii/Unicode if one desired. The point is that bXML is not designed to be editable by a text
editor since it has more of a 'structured' layout, sort of like a filesystem.
One other subject that I haven't mentioned, but need for another architecture that I designed
a while ago is a mechanism for 'parallel inheritance' overlay tree processing. Has anyone
else worked on this? The idea is to have one or more base trees and work with a delta tree
which represents changes from the underlying trees. This last part is a basic data structure
for a rule engine and metadata application environment I designed last year.
I don't mean to be distracting from external XML issues and standards, however XML is close to
being perfect for using for protocols, API's, message systems, RPC, etc. vs. DCOM, Corba
(hopefully this can be resolved), etc. Web-XML was a good example of this. It turns out that
for message passing systems in a cluster, you really need to externalize the kinds of
optimizations I'm talking about, vs. something normally internal to a particular SAX/DOM
parser.
sdw
Shekhar Kshirsager wrote:
> There are two places where use of XML can be optimized - one when it is
> transfered on the wire and second when the program tries to interpret the
> XML data using SAX,DOM etc.
> My interpretation is that Stephen is talking about optimizing the process of
> interpreting the XML document at the client.
> But I'm still not sure, what will the in-memory presentation of bXML buy us
> above DOM.
>
> Thanks,
> Shekhar Kshirsagar
>
> ----- Original Message -----
> From: Paul Prescod <paul at prescod.net>
> To: <xml-dev at ic.ac.uk>
> Sent: Thursday, March 25, 1999 6:30 PM
> Subject: Re: Is there anyone working on a binary version of XML?
>
> > "Stephen D. Williams" wrote:
> > > I'm not trying to create a new way to recognize XML, but a more
> efficient way to
> > > do all kinds of computer processing and communication with it.
> Innordinate
> > > amounts of time, money, effort, CPU, and bandwidth are spent at the
> interfaces
> > > between programs and other programs, databases, file systems, networks,
> servers,
> > > etc. XML is a good general solution, but some situations require
> optimization
> > > which is what I'm working on.
> >
> > I can see many ways that a typical XML document could be optimized for
> > size if XML compatibility was not a concern. Call it "compressed ML." I am
> > not clear, however, why CompressedML would need to be binary. There are
> > many languages where working with binary data is more expensive than
> > working with text.
> >
> > --
> > Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
> > http://itrc.uwaterloo.ca/~papresco
> >
> > "Perpetually obsolescing and thus losing all data and programs every 10
> > years (the current pattern) is no way to run an information economy or
> > a civilization." - Stewart Brand, founder of the Whole Earth Catalog
> > http://www.wired.com/news/news/culture/story/10124.html
> >
> > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
> CD-ROM/ISBN 981-02-3594-1
> > To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> > (un)subscribe xml-dev
> > To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
> message;
> > subscribe xml-dev-digest
> > List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
> >
> >
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
> To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list