Storing Lots of Fiddly Bits (was Re: What is XML for?)

Mark Birbeck Mark.Birbeck at
Wed Feb 3 18:30:46 GMT 1999

Paul Prescod wrote:
> > Sure. But I still have two issues. First, why would you query the
> > serialisation anyway? Wouldn't you want to query your 
> original database
> > and generate XML pages that reflect the results? 
> You certainly would if you use XML as *only* a serialization. 
> The thrust
> of this thread was that some people want to encode everything 
> in XML so
> that they can "query it." But XML is a lousy query representation for
> anything other than human-authored documents. (and debatably the best
> thing for queries against those!)

That is my suggestion. The DOM is no way fast enough - or efficient
enough - to be a query interface to loads of data. It is a handy way of
navigating trees though, but you would probably want to break your
processing of the data into a number of trees. For example, say I query
*my database* for all articles in any issue of a magazine that contain
the word 'Turkey'. The database server would now have these cached, but
to begin with I might get away with just putting references to each
issue into the DOM. If a user selects one of those issues from their
search results, I could then add to the DOM all the articles for that
issue that contain the word. I could even create a new DOM instance and
populate it, so that if the user moves away from that issue I could
delete the DOM and create a new one. (I could even not use the DOM at
all - hey, there's no law.)

Anyway, I think we're sort of agreeing, that the DOM on its own is not
suitable, but some helper stuff with the DOM is.

> > And modelling the data rather than the person
> > means you can no longer interchange your XML with other 
> systems because
> > you have two completely different sets of data, using 
> different DTDs.
> I don't follow that.

I simply mean that an XML document that contains data about people has a
different DTD to a document that has data about data. A server expecting
a 'person' document that meets with some DTD requirements is not going
to accept a document that matches the DTD for 'global data interchange'.

> > (And you can't say that your serialisation schema *will* allow this
> > interchange, because although your serialised data may be 
> well-formed,
> > the underlying data it represents may not be, so you need 
> the proper DTD
> > for the object.)
> Well-formedness has very little to do with DTDs so I don't follow this
> either.

I mean that you could have an XML document that fails against its DTD -
say be using an attribute in the wrong place. Now, if you devise
something that serialises your data so that you have entries that help
define your elements and attributes, that serialised data will *pass*
against *its* DTD - because its DTD is different (its the one for data
serialisation). Now when you pass this serialised data to another
server, it should be matched against its original DTD, otherwise you
won't know that its badly formed, but with a 'universal serialiser' you
could actually import it into a new database, despite its failings.

> > All I am saying is that the document *itself* could be the 
> abstraction
> > of the data.
> This is something else I don't follow. XML documents are 
> always encodings
> of abstractions. They are concrete, tangible, interchangable, 
> printable
> and can be given global names. Concrete, not abstract.

I suppose all I'm getting at is that XML is already data', so why do we
need to go to data'' in order to serialise? Why not just serialise from
the database to XML? This is not the same problem as transferring
schemas around.

> The objects they represent are logical, usually inaccessible 
> outside of an
> "address space" (i.e. your brain, your relational database) 
> and are thus
> termed abstract. The reason we need XSL is because the 
> abstractions cannot
> "stand alone". I can't transmit a book from my head to your 
> head. I need
> to serialize it on paper or online. I also can't transmit a 
> "book object"
> without serializing it somehow (i.e. XML). Before 
> serialization it is an
> abstraction.

Perhaps we are using the terms differently. If I have a picture of a
house and I show it to you and say, point to the window, you would do
so. But what you are pointing to is an 'abstraction' of the house - a
picture - and there is no window there! ("ceci n'est pas une pipe", and
all that.) Now, my point is that transmitting an XML mapping of some
database entries is like actually transmitting that picture itself - of
course it is not the house, but it *is* a representation of it. But it
seems to me that to serialise everything to a universal form, always
using the same DTD, is to end up transmitting a representation of the
*picture*, not the house. And then you have lost a lot of information.
And worse, you can now only send you data to systems that process
abstractions of pictures, not ones that process abstractions of houses.

Mark Birbeck
Managing Director
Intra Extra Digital Ltd.
39 Whitfield Street
t: 0171 681 4135
e: Mark.Birbeck at

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list