A Simple Thought

Stephen D. Williams sdw at lig.net
Fri Mar 26 15:32:55 GMT 1999


Leigh Dodds wrote:

> Hmm, I guess with Java you'd use an in-memory buffer and a class
> to wrap that buffer so that your accesses to the data would appear
> to be ordinary method calls accessing member variables, but actually
> just altered/read data at byte offsets in the buffer?

Exactly!  The class interface would be SAX/DOM/JGL-like but operate on a very
efficient representation.  The realization that I had is that I typically build
very meta-data driven applications and systems and that I seldom have business
data models represented by actual classes (in C++, where I learned my lesson,
and Java).  Since the data is accessed via collection interfaces anyway, the
storage can be completely opaque and optimized.

> Originally though I thought you were talking about a 'standard'
> representation.
> Shouldn't you then be avoiding 'other optimizations...to make processing
> in-place in Java fast'? Otherwise you're targeting a particular
> implementation
> language?

Ahh, there's the trick.  I believe I have most of a design for an data structure
that is fast in memory yet is 'flat' and can have its chunks just written out or
read in at any point.  It builds on some very old ideas I came up with for a
language I designed.  When viewed as an interchange format, it may not be the
most optimal space wise (although it should be better than XML text) but trades
a small amount of space for nearly zero processing overhead.  There will
probably also be a procedure for 'compacting' an object for storage into a
database or sending over a slow link vs. the 'fast' format usable between
servers in a cluster.

I'll be implementing the rest of this shortly and we can have another round of
discussion.
I'd really like a reference to the one Java project doing something similar.

> I'm interested in this (at least in part) as I've been toying with an
> application idea which could potentially have a lot of (small) XML documents
> built into a complex in-memory object graph. I'm concerned about the size
> of the object graph (and managing interconnections amongst nodes) and its
> later storage (don't want to have to reparse every time the application
> starts).
> Serialisation was originally what I was considering.

This is exactly the kind of problem I'm thinking of.  Since most people use
class interfaces to get at the data anyway, there's no need to chew up all the
processing time manipulating it behind the scenes in expensive ways.

Unfortunately the simple, obvious, traditional ways of building things
(especially in C++ and Java) cause massive storms of activity in large
programs.  (Object creation, initialization, building links, indexing, etc.
etc.)

sdw

> L.
>
> > -----Original Message-----
> > From: owner-xml-dev at ic.ac.uk [mailto:owner-xml-dev at ic.ac.uk]On Behalf Of
> > Stephen D. Williams
> > Sent: 26 March 1999 13:43
> > To: Samuel R. Blackburn
> > Subject: Re: A Simple Thought
> >
> >
> > This is in fact exactly the kind of thing that I am thinking,
> > with at least a
> > couple other optimizations thrown in to make processing in-place
> > in Java fast.
> >
> > sdw
> >
> > "Samuel R. Blackburn" wrote:
> >
> > > You know, if you parse the XML into a carefully designed data structure,
> > > you could write that structure to a file. To re-read the data, you would
> > > simply memory map the file (or put the structure into a shared memory
> > > segment). If the structure is designed so offsets are used instead of
> > > pointers, you could navigate is quickly and not have to worry about
> > > memory addresses involved. The OS will only page in those portions
> > > of the file that are really used.
> > >
> > > Just a thought,
> > >
> > > Sam
> > >
> > > -----Original Message-----
> > > From: Stephen D. Williams <sdw at lig.net>
> > > To: xml-dev at ic.ac.uk <xml-dev at ic.ac.uk>
> > > Date: Thursday, March 25, 1999 10:08 PM
> > > Subject: Re: Is there anyone working on a binary version of XML?
> > >
> > > >"Simon St.Laurent" wrote:
> > > >
> > > >> At 03:36 PM 3/25/99 -0500, DuCharme, Robert wrote:
> > > >> >>I know, I know, this is anathema to what many of you feel is the
> > > >> >>essence of XML, and I agree to a point.
> > > >> >
> > > >> >It's not so much about feelings, as about contradicting the
> > XML spec.
> > > >> >
> > > >> >[...]
> > > >> >
> > > >> >Applying XML concepts to a binary data format sounds interesting and
> > > >> >potentially useful, but it wouldn't be XML.
> > > >>
> > > >> One of these days I'd really love to stop talking about what
> > is and isn't
> > > >> XML, though I know it's fun, and start talking about what we
> > can do with
> > > >> XML and XML-like structures, whether they are SAX event
> > flows, DOM trees,
> > > >> or binary formats that build on an XML foundation.
> > > >>
> > > >> We might even get some real work done - and it might even be fun.
> > > >
> > > >I agree with the sentiment Simon.
> > > >
> > > >I'm required (or am requiring myself) to get a lot of real
> > work done very
> > > >quickly in the next
> > > >6 months hence my focus...
> > > >
> > > >Semantically, I am talking about using XML.  After parsing and
> > creating a
> > > >DOM tree or SAX
> > > >events, you no longer have XML but a data structure
> > semantically equivalent
> > > >to an XML
> > > >document.  Another way to think about what I'm proposing is
> > that it is a
> > > >cache of the data
> > > >structures produced from processing an XML document, cast in a openly
> > > >documented data
> > > >structure that is already flattened and ready for IO.
> > > >
> > > >In fact, this is how I arrived at this design after following
> > a few other
> > > >design constraints
> > > >and observations.  Of course from there it is a short stop to
> > say that you
> > > >can throw away the
> > > >'external' XML representation if you can recreate it from XMLb.
> > > >
> > > >My scheme makes parsing of XML a non-issue.  If I only have
> > that advantage
> > > >within my closed
> > > >system, so be it, converting to and from XML for external
> > purposes is in
> > > >fact what I intend to
> > > >do.
> > > >
> > > >In my case, I'm architecting a high speed clustering system, primarily
> > > >targeted at Linux/Unix
> > > >and Java.  In this kind of system of course you are splitting
> > applications
> > > >into many servers.
> > > >Of course the communication between those nodes is really internal
> > > >application communication,
> > > >the equivalent of that DOM tree, so it makes sense to optimize
> > it.  Think
> > > of
> > > >it this way,
> > > >you'd seldom design a large app where every method needs to
> > parse the XML
> > > >text block passed to
> > > >it to get a DOM tree (or SAX events) if the calling method has
> > a DOM tree
> > > >that it could just
> > > >pass.
> > > >
> > > >sdw
> > > >
> > > >> Simon St.Laurent
> > > >> XML: A Primer
> > > >> Sharing Bandwidth / Cookies
> > > >> http://www.simonstl.com
> > > >
> > > >
> > > >--
> > > >OptimaLogic - Finding Optimal Solutions
> > > >Web/Crypto/OO/Unix/Comm/Video/DBMS
> > > >sdw at lig.net   Stephen D. Williams  Senior Consultant/Architect
> > > >http://sdw.st
> > > >43392 Wayside Cir,Ashburn,VA 20147-4622 703-724-0118W 703-995-0407Fax
> > > >5Jan1999
> > > >
> > > >xml-dev: A list for W3C XML Developers. To post,
> mailto:xml-dev at ic.ac.uk
> > >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
> > CD-ROM/ISBN 981-02-3594-1
> > >To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> > >(un)subscribe xml-dev
> > >To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
> > message;
> > >subscribe xml-dev-digest
> > >List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
> > >
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN
> 981-02-3594-1
> To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list