Serializations and data structures (was Re: Topic Maps onSQL)

Paul Prescod paul at
Wed Nov 25 15:11:22 GMT 1998

Lars Marius Garshol wrote:
> Ideas are definitely more fundamental than sentences (seen as waves of
> sound or ink embedded in paper) could ever be. However, the sentence stored
> as a sequence of characters, ideas that are so simple that they remain
> constant for centuries.

So your fundamental argument seems to be that character data is simple and
thus more reliable and long-living. I agree. That's why I use XML.

> When you read the sentence 'All is suffering', would you claim that your
> head then contains the exact same idea that Siddharta Gautama (Indian
> prince who lived in the 6th century b.c., also known as Buddha) had? 

No, I was careful to qualify that *in the domain of XML* we can retrieve
ideas losslessly, because the idea is very simple, and can be defined in
terms of mathematical formalisms.

> This is all based on a hidden assumption: that all tools interpret and
> implement the grove model in exactly the same way. 

Interpret. Yes. Implement? No. As long as the API represents the model
faithfully, the underlying implementation can be whatever it wants. 
(or maybe you mean the implementation of the API must be the same...
that's true). Anyhow, the model is a mathematical formalism designed
specifically to disallow alternate interpretations.

> Can you really guarantee
> that for something as complex as groves for decades? There'll be no
> disagreement on the actual sequence of bytes in the files, but their
> interpretation in terms of the abstract grove is another thing entirely.

Disagreement on the actual sequence of bytes in the files is irrelevant.
If I can't get the moral equivalent of the same grove, ESIS or 
SAX events out of it that the creators intended then I am working with
different data (at the logical level) than they are. The fact that I
have the same bytes is not very comforting if the software that 
processes it fundamentally misunderstands it. ("I think that <HEAD>
is a synonym for <BOLD> and <!-- --> means emphasis.")

XML software can't work with serializations. It must work with the
data model! If that doesn't survive, all is lost. If the data model
a wishy, washy and implicit like that in the SGML and XML
specs, then the data is in danger of mild logical corruption as people
come to understand it differently. "What did Tim mean by this? And 
this?" If it is a well-defined formalism then that danger is much 
smaller (eliminated?).

