Storing Lots of Fiddly Bits (was Re: What is XML for?)

Mon Feb 1 16:57:46 GMT 1999

At 31 January 1999 20:33, W. Eliot Kimber wrote:
> [In response to Mark Birbeck]
> But you've not solved my problem, because the in-memory 
> abstraction of the
> *document* is still:
> 
> (xml-document
>   (data-instances
>     (element
>       (gi "person")
>       (content
>         (element
>           (gi "name")
>           (content
>             (literal "Eliot")))))))
> 
> So, while the result is closer to the abstraction of the 
> data, it is still
> not the original abstraction.

True. But I could have said:

<person oid="1" name="Eliot" sex="male">
  <employer oid-ref="2" />
</person>

<enterprise oid="2" name="ISOGEN International Corp" address="Dallas,
TX">
  <derived obj="employs" oid-ref="1" />
</enterprise>

if that was a better model of the internal data. Maybe I've missed the
subtlety of what you are saying the problem is, but in our system the
attributes of an object are exported as described above, and the
children of an object are exported as elements within other elements.
Seems to me to mirror exactly our object structure - and so far we have
been able to re-interpret DTDs back as data definitions. In other words,
we *can* generalise the solution.

> And note that even for an early-bound form, there are still 
> infinitely many
> ways to construct it

I still don't follow your logic - just because there are many ways to
construct it, doesn't mean you can't construct it.

> So no matter how you slice it, there will always be a 
> disjoint between the
> abstraction of the serialization form and the abstraction of the data
> objects being serialized, which means that a query onto the 
> abstraction of
> the serialization will not be the same as a query onto the 
> abstraction of
> the data that has been serialized. The gap might be bigger or 
> smaller, but
> there will always be a gap.  

Sure. But I still have two issues. First, why would you query the
serialisation anyway? Wouldn't you want to query your original database
and generate XML pages that reflect the results? Even if you have
serialised the data to XML files to speed up the movement of data, you
would still want to do searches against the original data. (The nice
thing about that - as a little aside - is that you create XML pages that
are 'results' pages, ready for the user to drill down through, using
whatever super-duper, 3D-helmet, speech-activated interface they have
access to.)

But second, and I think the main point, I don't understand why you are
distinguishing between the XML representation of an object and its
serialised form in the way you do? Why not just serialise and
de-serialise between XML and the database? I know you ARE doing that,
but the XML you are creating is some sort of 'normalised' representation
of the original data. You keep talking of the 'abstract' representation
of your data, but actually you are *losing* the abstraction, moving
from:

	a person who has the name Eliot

to

	an object which contains another object which has two
properties, one set to name and the other set to Eliot

Of course both are abstractions, but they model completely different
things (data and people). And modelling the data rather than the person
means you can no longer interchange your XML with other systems because
you have two completely different sets of data, using different DTDs.
(And you can't say that your serialisation schema *will* allow this
interchange, because although your serialised data may be well-formed,
the underlying data it represents may not be, so you need the proper DTD
for the object.)

> Which begs the question: if the abstraction of the document is not the
> abstraction of the data, why bother to create and store the 
> abstraction of
> the document when you can just as easily create and store the 
> abstraction
> of the data?

All I am saying is that the document *itself* could be the abstraction
of the data.

Anyway ... if I've missed the plot then I look forward to your
clarification, since we are dealing with similar issues here.

Regards,

Mark Birbeck
Managing Director
Intra Extra Digital Ltd.
39 Whitfield Street
London
W1P 5RE
w: http://www.iedigital.net/
t: 0171 681 4135
e: Mark.Birbeck at iedigital.net

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)