Notations

Rick Jelliffe ricko at allette.com.au
Wed Dec 9 01:35:15 GMT 1998


 From: Joel Bender <joel at spooky.emcs.cornell.edu

>There are really two problems: (1) what pattern of character strings are
>acceptable for content (attribute value or element content) and (2) how
>should those patterns be mapped to atomic types?
>...Both the 'type' of value and the 'encoding' of the value must be
>specified.

Which assumes that there are "atomic types" independent of domains.
When we look at database or things liks XIF we can see that the types
there
are, in fact, storage types (how many bits does the binary representation
fit into) rather than data types.

So we have:
* semantic 'type' (e.g. an RDF assertion connecting an element to
*  element 'type'  name (or attribute name)  (e.g. <cows>);
* architectural element 'type' name (or architectural attribute)
    (e.g., to point out a correspondence to an element type  in some
    better-known DTD, e.g, <cows html="p">);
* generic data 'type' (e.g. a number);
* specific data 'type' (e.g. a cardinal);
* storage 'type' (i.e. a storage constraint, e.g. fit into 32 bits);
* lexical 'type' (e.g. "-", [0-9]+, ( ".", [0-9]+)? ) or
* entity 'type' (e.g. a GIF file)

All these things are what people expect for typing, from different
domains.
It would be nice if each was solved; I cannot see why we should expect
any single "schema" language to solve all or most of them simultaneously
however. The fact that there is so much blythe talk about schemas and such
little apparant awareness that in fact a multitude of different things are
being coralled together does not fill me full of confidence.

There have been some attempts to disconnect atomic data typing from
storage typing:
* LISP bignums (a number can be any length of storage bits, the system
allocates
as much as you need);
* SGML (you mark up what is most convenient and relevent for the
documents,
and then make the system designer figure out how to represent your types
on
the system they have);
* perhaps languages like early BASIC and prolog too.

XML has been explicitly designed with a different process model than
SGML's:
for "resolved" document delivery over the WWW. So the next question should
be,
for atomic types, is there anything in the nature of the WWW which
promotes
particular types: I would say that we get
    * URLs;
    * the types used in MIME headers (dates, languages, MIME type, etc);
    * Base64  (& perhaps uuencode);
    * text.

So as a first step for defining types, I suggest that these are the only
'atomic'
types which are universal to XML documents. Everything else (including
numbers, let alone shorts!) is document-type-domain-dependent.

I hope the W3C schema effort splits off Tim Bray's old suggestion for
data types into a workign draft that can get some action fast. You can see
the same process at work as happened in SGML: the fact that everyone's
data-typing and storage-typing needs were (to everyone's surprise) so
different meant that no data types were ever needed to be standardized.

Rick Jelliffe

P.S. My book "The XML & SGML Cookbook" has several chapters
on notation-related issues
    for entities--Chapter 14: Data Content Notations
    for storage--Chapter 15: Formal System Identifiers
    for element--Chapter 16: Embedded Notations
ISBN 0-13-614223-0


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list