Serializations and data structures (was Re: Topic Maps on SQL)

Lars Marius Garshol larsga at ifi.uio.no
Wed Nov 25 12:25:03 GMT 1998


* Paul Prescod
|
| On the issue of what works and is interoperable in the real world, you are
| quite likely right, but on this "chicken and egg" issue of serialization
| vs. data model you are not. The serialization only exists to provide for
| the longevity of the data. Thus the data model is fundamental and the
| serialization ephemeral.
|
| Your argument is that a sentence is more fundamental than an idea, because
| the sentence is easier to transmit, record, replay and otherwise
| manipulate. But *by definition* ideas are more fundamental than sentences,
| because there can only be a sentence after their is an idea, 

Ideas are definitely more fundamental than sentences (seen as waves of
sound or ink embedded in paper) could ever be. However, the sentence stored 
as a sequence of characters, ideas that are so simple that they remain 
constant for centuries.

When you read the sentence 'All is suffering', would you claim that your
head then contains the exact same idea that Siddharta Gautama (Indian
prince who lived in the 6th century b.c., also known as Buddha) had? In
fact, Zen-Buddhism, a religion that exists specifically to transmit the
experience that this realization led to looks on words with the utmost 
distrust and do not rely on them to transmit the idea.

(The koans that the Zen-Buddhists use in their teaching may seem to
contradict this, but in fact a large part of their function is to break
down the student's reliance on, and faith in, words.)

| Document A must be published three times. It is encoded now in SGML with
| full minimizations. It is sent to the publisher and prints beautifully.
| Years pass. RCS SGML is superseded in the organization by XML. The 
| document's syntax is changed radically by running it through "sx." But 
| the person in charge of the conversion is careful to make sure that the 
| grove does not change. They run the print job again: it will print 
| beautifully -- and identically -- as long as the formatter is grove 
| driven. 10 more years pass. XML fades into oblivion and is replaced by 
| the more compact Lisp S-Expression notation (yes, Lisp has finally caught 
| on). But the S-Expression notation is designed to be lossless-ly 
| compatible with the XML grove, so the software runs off of the grove 
| instead of the serealization syntax. The document will print identically 
| *again*.

This is all based on a hidden assumption: that all tools interpret and
implement the grove model in exactly the same way. Can you really guarantee
that for something as complex as groves for decades? There'll be no
disagreement on the actual sequence of bytes in the files, but their
interpretation in terms of the abstract grove is another thing entirely.

--Lars M.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list