Generalizing the SGML/XML information model and Releasing MONDO

Thu Nov 20 18:56:32 GMT 1997

[This is a long email so I will also put it online at 
"http://www.chimu.com/projects/mondo/" ]

Some recent discussions on xml-dev and c.t.sgml have included query
languages, encoding complex information (trees, graphs, etc.), object
serialization, and extended metamodeling.  I recommend enlarging the
scope of these discussions and thinking about aligning SGML/XML with
other disciplines that can help accomplish these tasks.  This aligning
would take advantage of the tools and techniques that are already
available in other industries: not just by duplication of design but by
actually merging with more general capabilities.  Although alignment has
been successfully done in some areas of SGML/XML I think it is
conspicuously lacking in a crucial place: SGML's information model.  By
improving this particular weakness in SGML by taking advantage of
well-established industries, an abundance of other needs become much
more easily satisfied.

Generalizing the SGML/XML information model
-------------------------------------------

The desired applications of SGML/XML have grown beyond the original
focus on documents towards working with much more general information
and processing.  SGML is a combination of encoding technology and an
information modeling language.  But that modeling language (DTDs and
Groves) is very weak and is constrained by being focused on
document-oriented information.  It is also esoteric and not equivalent
to any of the mainstream information modeling approaches.  

I recommend considering modeling separately from encoding technology. 
For modeling I think object-oriented information models can subsume
SGML's document-oriented models and provide the ability to handle much
more advanced models.  Object-oriented information models can be very
general, expressive, and understandable.  This allows them to model many
types of information equally well: both document-oriented and more
general information.  The strength of object-oriented information
modeling has resulted in an abundance of good analysis, patterns, and
specific models being built using it.

This last point is the most important.  If SGML/XML aligns with the
information modeling industry, many more tools will immediately become
available.  For describing models you can use the Unified Modeling
Language (UML) and tools such as Rational Rose (and several other
techniques and tools).  Implementing models can be done very easily with
most OO languages (with or without generic frameworks), and the
resulting implementation can be far more knowledgeable about the
semantics of the information it is working with.  There are many
products that provide persistence and UI presentation that are designed
to work with OO DomainModels.  There are standard query languages
(OQL/SQL) and interface languages (CORBA/IDL).  The information modeling
industry provides an extensive list of high-quality technologies,
standards, and techniques.

There has been a lot of great work done with SGML/XML in both modeling
(DTDs) and technologies (e.g. HyTime).  If this quality work is
integrated into the common environment of OO information modeling and OO
technologies then it will be available to a larger audience.  It will
also frequently become easier to understand and more capable because it
can take advantage of the inherent abilities of OO models.  For example,
much of HyTime addressing is very easily and flexibly described in terms
of object associations.  HyTime becomes more powerful in the general
object context.

This isn't to say everything is easy.  There are still the issues of how
to work with different information models on different technologies
(e.g. how smart the objects are) and what additional technologies need
to be provided to reproduce expected SGML functionality (e.g. like
HyTime or extending (through object-methods) OQL with
containment-closure abilities).  And some tools would never be
generalized because the SGML DTD&Grove model are sufficient for the task
or the tool is too high a quality to risk moving (e.g. Jade).

Overall, I think the benefits will be enormous.  

MONDO
-----

I have been working on a project (called MONDO) to prove the benefits of
this alignment and to provide an architecture and the frameworks to
support it.  MONDO is primarily an architecture: it describes the
components (e.g. ObjectBuilder, DomainModel, ObjectEncoder), their
responsibilities, and the interfaces among those components.  It is
meant to be open and language neutral.

MONDO will also have a reference implementation in Java (prototypes were
in Java, Perl, and Smalltalk).  The current reference implementation
includes frameworks and tools for the normal document-oriented tasks and
also for some more general or object-oriented capabilities.  As an
example of the later, MONDO can serialize and deserialize Java objects
to human readable (XML or OML) encodings.

I have been working on MONDO for quite a while and been producing
tangibles (i.e. designs, documentation, and code) off and on for a bit
more than a year.  This is the first time I am releasing them openly. 
The WWW site currently has some FAQ's, some references (extracted from
the design document), and placeholders and timelines for expected
additions.  The references may be especially useful because they provide
a sampling of the integration from these multiple fields.  I hope to
have the design document (first pass is about 80 pages) up on the web
site by early next week and will start putting up the reference code
shortly thereafter.  

The MONDO WWW site is at:
    http://www.chimu.com/projects/mondo/

As an example (teaser ;-) of the MONDO design, I have included a couple
(non-sequential but related) paragraphs below.  

======
ObjectBuilder
The responsibility of the ObjectBuilder is to build all or part of the
Objectbase from an external source.  Generally this source will be a
human-readable text file, but there are several stages to ObjectBuilding
which can each have different approaches (e.g. we could read from a
binary file instead).  Assuming we have a textual file-based approach,
ObjectBuilding would go through three stages:
    Read from the text file and produce a stream of text
    Parse the text and turn it into a recipe (what objects to build and
what ingredients to use)
    Build the recipe and construct objects within the DomainModel

-------

Recipes for building objects
A recipe describes how to build a collection of associated objects.  All
the information that is placed into the DomainModel by MONDO is the
result of building recipes.  By formalizing recipes we separate the
encoding of information (e.g. whether it is human readable and how to
parse it) from what information is in the encoding.  MONDO uses that
information to construct the knowledge in a form we want to work with,
the Objectbase.  
======

Any feedback on MONDO or these concepts is appreciated and I hope they
contribute to some of the topics that have been addressed recently.  I
will let people know when the main design document is on line and when
the code to work with is downloadable.  If you are interested in MONDO
for your application or want to help with the project, let me know.

--Mark
mark.fussell at chimu.com

  i   ChiMu Corporation      Architectures for Information
 h M   info at chimu.com         Object-Oriented Information Systems
C   u    www.chimu.com         Architecture, Frameworks, and Mentoring

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)