Serializations and data structures (was Re: Topic MapsonSQL)

Paul Prescod paul at prescod.net
Wed Nov 25 21:49:50 GMT 1998


Lars Marius Garshol wrote:
> 
> 
> Well, you can't very well transmit the data model itself between computers
> or down through time. In other words: you must have the serialization
> syntax. 

You must have a serialization syntax, but they are technically
interchangeable: SGML, XML, S-Expressions, etc. Socially, they are not
interchangeable, of course. If Tim's point was that socially, standardized
syntaxes are very, very important, I would agree with him. I just disagree
when he goes further and says that they are primary.

> As the argument stands it looks like checkmate to me. 

If we've checkmated him, why bother continuing? :)

> Why not turn the
> issue around slightly: what would be difficult to do without the data
> model? 

Luckily I've already got an answer to this question:

---
Understanding an SGML document's structure is easy for simple things, but
there are many issues that are quite complex. For instance, it is not
clear whether comments should be available for a DSSSL spec. to work on,
or whether they should be addressable by hyperlinks. It isn't clear
whether it should be possible to address every character, or only
non-contiguous spans of characters. Should it be possible to address and
process tokens in an attribute value or only character spans? Should it be
possible to address markup declarations? XLink and XSL must solve all of
the same issues.

The problem is that XML is defined in terms of its syntax, just as SGML
was at first. Linking and processing are done in terms of some data model,
not in terms of syntax. When you make a link between two elements, you are
not linking in terms of the character positions of the start- and end-tags
in an SGML or XML entity. You are linking in terms of abstract notions
such as "element", "attributes" and "parse tree". The role of an XML
parser is to throw away the syntax and rebuild the logical ("abstract")
view. The role of a linking engine (such as a web browser) is to make
links in terms of that logical view. The role of a stylesheet engine is to
apply formatting in terms of that logical view.

Unless stylesheet languages, text databases, formatting engines and
editors share a view, processing will be unreliable and complicated. It is
not very common for XML and SGML applications and toolkits to provide all
of the information necessary for building many classes of sophisticated
applications, such as editors. There is not even a standardized way for an
toolkit to express what information from the SGML/XML document it will
preserve. Even if two toolkits preserve exactly the same information, it
is quite possible that they use different structures to organize the
information.

A related problem is how to address components of data types other than
SGML and XML. For instance, how do you make a hyperlink to a particular
frame in an MPEG movie, or a particular note in a midi sequence? How would
you extract that information in a stylesheet (for instance for sequencing
a multimedia hyperdocument). It makes no sense to address in terms of
bytes, because often a single logical entity, like a frame, is actually
spread across several bytes and they may not be contiguous. Addressing in
terms of characters would make even less sense because MPEG movies and
midi sequences are not character based. The web solves this problem by
inventing a new "query language" (in the form of extensions to URLs) for
each data type. This more or less works, but it leads to a proliferation
of similar, but incompatible query languages doing the same basic thing,
but with different underlying models.
---

http://www.prescod.net/groves/shorttut/


 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself.
 http://itrc.uwaterloo.ca/~papresco
Christmas shopping in a T-Shirt? Toto, I have a feeling we 
aren't in Canada anymore.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list