Integrity in the Hands of the Client

Rick Jelliffe ricko at allette.com.au
Sat Nov 22 06:32:46 GMT 1997


 
> From: Joe Lapp <jlapp at acm.org>

> This seems to work.  It stores information about books and authors,
> and it is not possible to add a book without associating it with
> the description of some author.  But we can see that it breaks as
> soon as we add any other kind of element that has an ID.  We know
> that every book will eventually have an ID, because we'll soon want
> to have an element whose content elements reference the New York
> Times Bestsellers.  Once we do that, nothing prevents an administrator
> (or the client program he or she is using) from indicating that the
> author of a book is another book.  This DTD will not suffice.

The SGML standard explictly says that the SGML markup declarations only
form part of the definition of a document type.  So you are being no
more bold than the SGML standard.  (The contraction DTD is actually
the "Document Type Definition" not the "Document Type Declarations"
by the way, as further evidence of this distinction.)  

People expect XML/ SGML to provide a way to do everything, then get 
surprised that it doesnt. It does not intend to. It is not a format 
for modeling data; it is a language for marking up data with enough 
information that your clever programs can make use of it.  XML/SGML's 
validation only extends to very simple content models and to making 
sure that IDs are unique, just for this purpose.

The problem you describe above is very simply dealt with.  Make an "application
requirement" that all IDs for books start with one prefix, and that
all IDs for authors start with another.  This is very common practise in
the industry.  You can write simple external validating code to enforce
it, and it only requires a single line of plain English to document it.

It is almost universal practise among experienced DTD writers to specify
unique prefixes for IDs of different types.  I recommend it to anyone
writing XML systems.  The simplest way is to just use a contracted form
of the element type name (or the current element or its distinguishing 
container) as the prefix. 

There is an ISO standard way (part of the SGML Extended Facilities of HyTime'97
which is on the WWW) to mark this up.  The Lexical Definition annex lets
you give (in one fixed attribute) a POSIX regular expression to constain the
format of another attribute.  So you can specify that IDs and IDREFs have 
a common prefix, for particular element types.  (Of course, your software
then needs to implement this standard to be able to use the information, but
that is no different from any other markup.)  

It is just false that SGML (the family of technologies: ISO 8879, ISO 10774,
ISO 9070, etc) does not provide a way to use regular expressions (or any
other syntax you choose) to provide models for data.  The lexical typing
facilities have been on the books for 5(?) years now, and have just been
overhauled in HyTime '97 standard. However, because SGML systems do not 
have to provide it to be conforming, few have, as part of their standard configuration, so far.  XML has taken exactly the same
road as SGML  
and left more useful data validation to the application to take care of.


Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list