Recipes for Information

Mark L. Fussell fussellm at
Sat Nov 22 09:06:36 GMT 1997

This is somewhat related to the recent threads on Integrity and 
Inheritance.  It is again a bit long so it will be duplicated at MONDO 


I suggest that SGML/XML be perceived as a markup language to describe how 
to build information instead of describing (and modeling) the information 
itself.  This may appear to be a subtle distinction but it has a lot of 

I will start with a recent concrete example from Rick Jelliffe 
<ricko at>:  
    <!ELEMENT citation   ( title, text, url)>

This says a citation is composed of (through its content) a title, text, 
and url.  But do not view that as the information model of a citation; 
consider it a recipe for a citation.  We can build a citation if we 
supply the three (named) ingredients: title, text, and url.  The detail 
of the resulting information (which I will call an object) is unknown.  
It is likely that the citation object will have these three attributes, 
but it could have more or it could even discard some of them (in which 
case the recipe included information that the model did not need).

If we have a different element that requires more information we could 
have a different recipe:
       <!ELEMENT DetailedCitation   ( title, text, name, text, url )>
The object that results from this recipe might be the same type as a 
citation object, a subtype of the citation object (i.e. treatable as a 
citation object but has more capabilities), or even an unrelated type of 
object.  For the moment we will abstain on discussing anything about the 
objects resulting from the DetailedCitation and the Citation recipes [why 
I started capitalizing will be explained later too].

What about combining the two recipes into a single element?  We could 
combine them as:
     <!ELEMENT Citation   ( ( title, text, url) | (title, text, name, 
text, url) )>
     <!ELEMENT Citation   ( title, (text, name)?, text, url  )>
     <!ELEMENT Citation   ( title,  text, (name, text)?, url  )>

This would be ambiguous (in SGML terms) for the first two but all of 
them are bad recipes.  They are bad because we (or the computer) must 
look at all the content to know which version we are using.  This is 
analogous to reading a whole recipe before we can be sure what we are 
trying to make.  It would be better to more clearly separate the options 
from the requirements if you choose that option.  Our original version 
separated the recipes through the elements:
       <!ELEMENT Citation   ( title, text, url)>
       <!ELEMENT DetailedCitation   ( title, text, name, text, url )>

We could also do this with:
       <!ELEMENT Citation     ( basicInfo & detailedInfo? )>
       <!ELEMENT basicInfo    ( title, text, url)>
       <!ELEMENT detailedInfo ( text, name)>
       <!ELEMENT Citation     ( basic | detailed )>
       <!ELEMENT basic        ( title, text, url)>
       <!ELEMENT detailed     ( title, text, url, text, name)>

In these forms it is explicit what we are trying to build (or at least 
the complexity is dramatically reduced).  We do not have to look into the 
details of the information itself.

Now I will ask for a leap of faith.

Consider separating ELEMENTs between Recipes that build objects and 
Parameters that name the ingredients that are required for a particular 
recipe.  As an architectural-form it would look like this:
   <!ELEMENT   Recipe      (parameter)*>
   <!ELEMENT   parameter   (Recipe)>

Although in the content model parameters are sequential, their order is 
insignificant semantically.  Each parameter must have a unique name, so 
consider them to be and-ed together instead of seq-ed.  Sort of like:
   <!ELEMENT   Recipe      (parameter)&*>
or like required element attributes.

As a convention I will capitalize the Recipes and keep parameters in 
lowercase.  Now returning to our example, to build a Citation required 
three parameters:
       <!ELEMENT Citation   ( title & text & url)>

The original ordering of the parameters is irrelevant to the 
informational content because each parameter is uniquely named, it is 
only a presentation/encoding restriction to have them be sequential.  
Also, the parameters do not describe the Types of the ingredients, just 
the Role of them in building the recipe.  All of 'title', 'text', and 
'url' could be simple strings:
       <!ELEMENT title    (String)>
       <!ELEMENT text     (String)>
       <!ELEMENT url      (String)>
       <!ELEMENT String   (#PCDATA)*>
Or any of them could have a more complex type.  By separating the two 
types of elements we can 
    Be very explicit about what we are constructing
    Have a great deal of flexibility for reuse of elements 
    Use very simple content models that produce complex structures 

Note that although the '&' is considered complex to implement, this 
particular use of it has the same form as attributes: Parameters are 
unordered and possibly required.

You might have noticed that String cheats: a String does not follow the 
required Recipe pattern of having only parameters in content.  This is a 
convenience shortcut Recipe [OK, and an insanity prevention device], 
which makes putting strings of text into this format more easily.  
Similarly we will probably need to have a shortcut for Lists (sequences) 
of objects:
       <!ELEMENT List     (Recipe)*>

With these additions we have to modify our original description of the 
architectural-form of Recipes to:
   <!ELEMENT   Recipe       (parameter)*>
   <!ELEMENT   StringRecipe (#PCDATA)*>
   <!ELEMENT   ListRecipe   (Recipe)*>
   <!ELEMENT   parameter    (Recipe | StringRecipe | ListRecipe )>

Recipes, DTDs, and DomainModels
Each Recipe builds an object.  What is the type of this object and how 
does it relate to the ELEMENT content model?  I propose (and agree with 
others proposing) that there should be no required connection between the 
rules of a recipe (the DTD) and the rules of the DomainModel objects 
built from that recipe.  Objects can have far more complex relationship 
rules than DTDs can describe and the DTD will either over-constrain or 
under-constrain the built objects.  

Instead consider the DTD as similar to a UI Form.  You may want to place 
things in a particular order and group them together:
      FirstName   LastName
          FirstName  LastName

But this is a presentation of the (view independent) information model 
that has a person with several attributes and associations in no 
particular order (even children do not need to be explicitly ordered for 
orderings can be derived from [for example] the child's birthdate).  The 
UI/DTD can place constraints (like a SSN has a 123-45-6789 format) but it 
should be very careful about these constraints (what about 99- SSNs) or 
really delegate the responsibility of validation to the DomainModel.  But 
simplified views are still useful.

DTDs can still be used to produce an information model but it should be 
possible to unlink the information model and have it start a more robust 
life of its own (or the dependency reversed).  The Recipes should still 
be useful because they encode the knowledge required to build the 
information independently of how precisely or extensively it is modeled 
(up to a point).  The recipes can live on as the model grows.

And, in a strange circularity, information models are also (obviously) 
information so they can again be encoded as recipes in SGML/XML and used 
as metadata for the domain model.  So although DTDs are not good 
information models, there is nothing stopping SGML/XML from being a good 
encoding for good information models.

mark.fussell at

  i   ChiMu Corporation      Architectures for Information
 h M   info at         Object-Oriented Information Systems
C   u         Architecture, Frameworks, and Mentoring

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list