Storing Lots of Fiddly Bits (was Re: What is XML for?)

Borden, Jonathan jborden at
Mon Feb 8 06:38:33 GMT 1999

W. Eliot Kimber wrote:
> At 10:13 PM 2/7/99 -0500, Borden, Jonathan wrote:
> >
> >	Ah, yes but realize that business object modelling is done
> at a higher
> >level than the relational table (which is a data structure). XML is the
> >serialization of the DOM tree based data structure. When business object
> >need to employ tree based data structures, they may choose to
> store these in
> >XML serializations. Another option would be for these business objects to
> >interact with a database through the same DOM interfaces. This way the
> >business object layer is isolated from the details of the storage layer.
> If I understand you correctly, you're saying:
> 1. XML documents are serializations of things (such as business objects)


> 2. The DOM is the abstraction of that serialization

	not my statement. the DOM is an interface onto a tree structure. You are
hung up on the term serialization which is specific to files and other flat
persistance formats. What I am talking about is persistence in general. An
object implementing a DOM interface may be persisted from a file or HTTP
stream in which case it is built from a serialization, it may also be
persisted from an ODB in which case it is *not* persisted from a
serialization. Both the file and the ODB can be used by the DOM.

> 3. Processing systems may need to turn the serialization back into their
>    original objects

	processing systems may need to obtain heirarchical data from storage
systems. these systems need not store information in serial format. again:

> 3. Therefore we should pretend that relational databases are really DOM
>    trees.

	no. if the data is tabular then use a recordset. in the specific cases when
1) we are storing data which is naturally hierarchical. 2) when the data
needs to interface with systems which for other reasons employ DOM
interfaces e.g. my XSL processor us built on a DOM interface and I wish to
query the database using XQL (which happens to be built into my XSL
processor in this example), it is more convenient to interface to the data
using DOM interfaces than it is using recordsets (i.e. tabular data).

	I am saying over and over: if the data is relational, use recordsets but
when the data is hierarchical DOM interfaces provide less of an impedence
mismatch onto the data.

> This doesn't make any sense to me. Why use the DOM (or any other
> abstraction of XML documents--I'm not picking on the DOM in particular
> here) for direct access to business object data?

	We are NOT talking about direct access to business objects rather the
mechanism by which business object talk to the database. The business object
tier is above the data object tier.

> Why not access those
> objects directly?  Or maybe we're talking about what's happening at
> different layers in the system.

	yes! yes! yes!
> Here's the way I view the scenario:
> I start with a business object: "airplane". I model it abstractly:
>    airplane => [fuselage, wing, tail, cockpit]
> I then want to create instances of airplanes: I write IDL (or EXPRESS or
> ...) definitions of my business objects that directly reflect their
> properties:
> // NOTE: Phoney IDL
> interface Airplane {
>    Part Fuselage;
>    Part Wing;
>    Part Tail;
>    Part Cockpit;
> };

	Part fuselage is really a structure:

	interface Airplane {
		Fuselage f;
		Wing wleft;
		Wing wright;
		Tail t;
		Cockput c;

	interface Fuselage {

		Strut strut;
		X-assembly x;
		Y-assembly y;
	interface Wing {


	and so on,

	now suppose each airplane has different Fuselage, Wings, Tails, Cockpits;
and suppose each of these are build via 10 sub-parts and so on 50 levels
deep until we get to sheet metal, screws and wires. An airplane is a complex
piece of equipment.

> I then have somebody implement some objects to this interface.
> How do these
> objects store their data? Don't care. How do they serialize their data?
> Don't care.

	Since you appear to be the CEO of the aircraft company, who cares? Why not
just have someone design the plane, implement it, test it and build it. Who
cares about databases or even computers?

	If you don't care you don't have an airplane (or the plans for one).
Someone has to care about the details. Objects typically don't just 'store
data' into databases. Even with ODBMS there is an interface/API onto the DB
(this can be base classes in C++ etc. different for each DB)

> Can I use the DOM to access these objects? Of course
> not--these
> are airplanes, not documents--the DOM isn't relevant.

	Ok suppose I have a set of airplanes lets try this two ways:

	First with the DOM (stylized):

	NodeList airplanes_data = container.getElementsByTagName("airplane");

	ok now build your business object (this is where you can spend your time).

	Now with SQL:

	Recordset rs = conn.Execute("select * from
airplanes,fuselages,wings,tails,cockpits,x-assembly,y-assemblies, .... about
3^10 total tables here (assuming 10 levels deep) .... screws,sheets,wires
where .....);

	Alternatively you can write out 3^10 individual select statements.

	After a few weeks/months of work you can start working on your business

	Arguably, when using an ODBMS this example would be more straightforward
(but you picked RDBMS). The problem is that there is no standard, language
independent interface onto ODBMS's. The DOM, while not the perfect interface
*is* standard, and this is the big utility.

> Why is this last bit the case? Because there are infinitely many ways to
> serialize a given set of abstract objects, so only the serializer
> knows how
> to do the deserialization. In any case, it's a strong chance that the gap
> between the XML structure and the business object structure will be at
> least two levels of abstraction (depending on whether the serialization is
> late or early bound), if not more.
> Thus, (and here's the point I've been trying to make from the
> beginning, so
> listen closely)...
> ...wait for it...
> ...The XML you get out of such as system isn't your business objects--its
> an arbitrary serialization of the internal representation of your business
> objects. Using the DOM (that is, an in-memory abstraction of *documents*)
> as the basis for direct business object access is simply nuts.
	Actually not even this. First I'm never actually dealing with XML, I've
only shown DOM interfaces. My business objects internally use DOM interfaces
to interact with a bit-bucket. Where does an XML document come into play

> This is not to say that fundamentally-hierarchical graph-based data models
> aren't useful for representing business objects--certainly they are (or we
> wouldn't be bothering to build generalized grove-management systems nor
> would we have used groves to prepresent HyTime's own business
> objects). But
> the DOM, in particular, is not a generalized data structure--it's a way of
> representing *XML documents* in memory AND NOTHING ELSE.

	Err, no. I am saying that I can use the DOM to represent hierarchical data.
This data *can* be expressed, serialized, as an XML document, but between my
database and my business object, there need never exist an XML document.

	Say whatever you please but if I have a piece of code from James Clark
(e.g. Jade/SP/groveoa) or Microsoft or IBM, I'm quite free to use it as I
see fit.

	For example, I get to say (using 'extended DOM'):

	NodeList anotherSet = airplanes.selectNodes("airplane[@color='red' and
.//screw/thread/@pitch = 64]");

to select all red airplanes with screws having a pitch=64...

	Have you written alot of programs which directly access databases? Do you
ever have to code the objects which access the databases? If you stay up in
la la object modelling land, you may not appreciate what I am saying. In
working with this stuff, I am finding that I am more efficient, and I can
get work done more quickly using these interfaces.

> Or said another way: there's no magic in the DOM (or groves or XML) that
> will make storing and managing business objects easier.   What will help
> are standardized serialization definitions, such as XMI or the
> new STEP XML
> Representation work item.  But these only limit the number of instances of
> translation layers that have to be written--they don't eliminate the need
> for translation between the business object models and their
> serializations.
	XMOP for example ( is a way
to serialize arbitrary COM objects using their typeinfo metadata. XMOP is a
layer that can persist objects into either a) a stream (serialization) b)
direct-to-DOM. When I attempted to design a direct-to-Recordset persistence
interface on XMOP I found that I had to essentially develop a
DOM<->Relational mapping. This is because arbitrary objects can be modelled
in a hierarchical fashion (e.g. serialized to XML).

	In another example, using the medical imaging DICOM protocol (a complex
property based protocol) I have developed a mapping to the Microsoft
PropertySet format (used with Index Server). This mapping is not clean (at
all given the inability to represent certain DICOM structures as
PROPVARIANTs). This causes similar problems in mapping the protocol to a
relational database (the workaround is to use binary data). Using XML and
the DOM was a piece of cake to solve this difficult problem.

	So, I'm not saying that this is the cure for all the world's problems or
that this is a hammer and all the world is a nail, but on the other hand,
when you have a hammer in your hand, and you see a nail, take the shot.

	The simple fact is that the uses of the DOM interfaces are determined not
by their designers rather by the creativity of those individuals who use
them. The original CPU was designed to be a calculator. Use your

Jonathan Borden

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list