Storing Lots of Fiddly Bits (was Re: What is XML for?)

W. Eliot Kimber eliot at
Mon Feb 8 04:49:34 GMT 1999

At 10:13 PM 2/7/99 -0500, Borden, Jonathan wrote:
>	Ah, yes but realize that business object modelling is done at a higher
>level than the relational table (which is a data structure). XML is the
>serialization of the DOM tree based data structure. When business object
>need to employ tree based data structures, they may choose to store these in
>XML serializations. Another option would be for these business objects to
>interact with a database through the same DOM interfaces. This way the
>business object layer is isolated from the details of the storage layer.

If I understand you correctly, you're saying:

1. XML documents are serializations of things (such as business objects)
2. The DOM is the abstraction of that serialization 
3. Processing systems may need to turn the serialization back into their
   original objects
3. Therefore we should pretend that relational databases are really DOM

This doesn't make any sense to me. Why use the DOM (or any other
abstraction of XML documents--I'm not picking on the DOM in particular
here) for direct access to business object data?  Why not access those
objects directly?  Or maybe we're talking about what's happening at
different layers in the system.

Here's the way I view the scenario:

I start with a business object: "airplane". I model it abstractly: 

   airplane => [fuselage, wing, tail, cockpit]

I then want to create instances of airplanes: I write IDL (or EXPRESS or
...) definitions of my business objects that directly reflect their

// NOTE: Phoney IDL 
interface Airplane {
   Part Fuselage;
   Part Wing;
   Part Tail;
   Part Cockpit;

I then have somebody implement some objects to this interface. How do these
objects store their data? Don't care. How do they serialize their data?
Don't care. Can I use the DOM to access these objects? Of course not--these
are airplanes, not documents--the DOM isn't relevant.

Now, I put my object implementor hat on:

I have to implement this Airplane interface.  I think: what technology do I
have to store lots of fiddly bits?  Do I think "XML"? Maybe. Do I think
"relational databases"? Almost certainly. Do I think "object databases"?
Quite probably.  

If I think "XML", why would I think it and what would I get?

One reason might be: "hey, I can serialize this stuff to disk using a
standard syntax and abstraction--that could make it really easy to use free
tools and protect my data through a standard I don't have to pay for the
right to use." But then I think "oh, but XML's model might not be a good
match for my data structures--might incur a lot of
serialization/deserialization overhead." I ponder for a bit. "Let's look at
relational databases again--they're fast. I still have to serialize and
deserialize, but that technology is mature and I can hire SQL geeks in a
heartbeat." I have a Coke. "But wait--object techology looks pretty good
too--I could just implement directly to my interfaces and cut out the
middle layer. I could still serialize for interchange--I might even get
that for free from the OODB vendor."

Ok, object databse it is.  I program away, happy as a clam. The system
works and it is a joy [this is a story, remember]. 

Now I say, "hey, let's try this XML serialization jabby the vendor
provides, wonder what I'll get?" I push the "dump to XML" button. What does
it look like?  It's ugly--I've got no idea what they were thinking. Angle
brackets are swimming before my eyes. But, I know I can suck it back it in,
supposedly without loss. I try it--hey presto, my data's back. Cool.

Why is this last bit the case? Because there are infinitely many ways to
serialize a given set of abstract objects, so only the serializer knows how
to do the deserialization. In any case, it's a strong chance that the gap
between the XML structure and the business object structure will be at
least two levels of abstraction (depending on whether the serialization is
late or early bound), if not more.

Thus, (and here's the point I've been trying to make from the beginning, so
listen closely)...

...wait for it...

...The XML you get out of such as system isn't your business objects--its
an arbitrary serialization of the internal representation of your business
objects. Using the DOM (that is, an in-memory abstraction of *documents*)
as the basis for direct business object access is simply nuts.

This is not to say that fundamentally-hierarchical graph-based data models
aren't useful for representing business objects--certainly they are (or we
wouldn't be bothering to build generalized grove-management systems nor
would we have used groves to prepresent HyTime's own business objects). But
the DOM, in particular, is not a generalized data structure--it's a way of
representing *XML documents* in memory AND NOTHING ELSE.

So unless by "DOM" you mean "any fundamentally hierarchical graph
representation of data", it's nonsense to talk about using the DOM as the
API for data objects--the most that can mean is that *for your flavor of
serialization you've defined functions that do the deserialization as an
application of the DOM*. Which is fine, but it's not the same as *USING THE
DOME FOR DATA ACCESS*, because it's not different from implementing your
business objects on top of some other data storage technology.

If you do mean graphs, then the DOM, in particular, isn't what you want
because it's not generalized--it's a highly optimized, use-specific object
model for XML documents. Good for it's purpose but not generalized. It also
lacks a more general model of which it is an application.  

Or said another way: there's no magic in the DOM (or groves or XML) that
will make storing and managing business objects easier.   What will help
are standardized serialization definitions, such as XMI or the new STEP XML
Representation work item.  But these only limit the number of instances of
translation layers that have to be written--they don't eliminate the need
for translation between the business object models and their serializations.


<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list