Class hierarchies in XML
W. Eliot Kimber
eliot at isogen.com
Thu Jul 3 10:56:56 BST 1997
At 09:27 PM 7/2/97 -0400, Giovanni Flammia wrote:
>We really need to build an object-oriented hierarchy, with classes that
>are extended
>by subclasses and so on...For example, a <restaurant> is a subclass of
><location> and
>inherits the properties of <location> such as <address> and <street
>number>, but
>adds other properties, such as <menu>.
>What is the proper syntax for expressing classes and sub-classes, or
>types and subtypes,
>inheritance and so on? i.e how do I tell in a document that <restaurant>
>is a subclass of <location> (and perhaps allow even multiple
>inheritance?)
The architectural approach, using the formalism and syntax defined by the
Architectural Forms Definition Requirements (AFDR) annex of HyTime (2d
Edition), works as follows:
1. You define some set of superclasses. This definition consists of two
essential parts: a set of SGML element and attribute declarations (i.e., a
"DTD") and some documentation of the semantics of these classes. This
serves to define a set of semantics, give them names (the element types and
attributes), and give the whole set a name (the public ID or URN of the
superclass set, declared as a notation). These definitions are first and
foremost *documentation*. However, the declarations can be used to do
validation of documents against the architecture, if desired (the SP parser
supports this, for example). They may also suggest the design of
object-oriented programs that provide "methods" for the element classes.
For example, the following set of declarations declares an architecture for
describing "locations":
<!-- Declarations and documentation for the "location" architecture. Refer to
this architecture with the public ID "-//G. Flammia//NOTATION Location
Architecture//EN". -->
<!ELEMENT location -- A place (building, venue, etc) people go --
-- A location must have a name and address. It may have
additional descriptive properties appropriate for the
type of place (e.g., a Restaurant may have a menu property) --
- - (name,
address,
loc-descriptor*)
>
<!ATTLIST location
ID ID #IMPLIED -- Unique ID of the location, to enable linking --
>
<!ELEMENT name -- A descriptive name for a location --
- - (#PCDATA)
>
<!ELEMENT address -- The address of a place --
- - (address-item+ |
address-block)
>
<!ELEMENT address-item -- A component of an address (e.g., street, city,
...) --
- - (#PCDATA)
>
<!ELEMENT address-block -- An unstructured address --
- - (#PCDATA)
>
<!ELEMENT loc-descriptor -- A descriptor for a location --
-- Contains additional decriptive information for a location --
- - (#PCDATA | loc-bridge)*
>
<!ELEMENT loc-bridge -- "Architectural bridging element". Generic,
semantic-less structure (e.g., Paragraph). --
- - (#PCDATA | loc-bridge)*
>
<!-- End of Location architecture -->
This set of declarations has defined a very general set of superclasses,
defining and documenting the minimum requirements for describing locations.
Note that these are *minimum* requirements--you can add additional
sophistications when you specialize from this general architecture. The
loc-descriptor and loc-bridge element forms are intended to be specialized
for different kinds of locations.
I can now define a "Restaurant Architecture", derived from the Location
architecture, that adds specialized elements unique to (or needed for)
restaurants. This is also defining a set of superclasses, derived from the
location superclasses, but intended to be specialized for individual
documents. Again, the primary purpose of the following is to formally
declare and document the classes and their semantics.
<!-- Restaurant description architecture. Derived from the location
architecture. Refer to this architecture with the public id "-//Eliot
Kimber//NOTATION Restaurant Architecture//EN" -->
<!-- Declare names of superclass set this set of declarations is derived
from: -->
<!IS10744 ArcBase location >
<!NOTATION location PUBLIC "-//G. Flammia//NOTATION Location Architecture//EN"
-- Pointer to superclass "location" architecture -->
<!ELEMENT Restaurant -- Describes a restaurant --
- - (Name,
Address,
Menu,
Hours,
Cost)
>
<!ATTLIST Restaurant
ID ID #IMPLIED
location NAME #FIXED "location"
-- Define derivation of class "restaurant" from superclass "location" --
-- Attribute name "location" is name of architecture (coincidently
the same as the key class in the architecture in this case). --
>
<!ELEMENT Name -- A descriptive name for a location --
- - (#PCDATA)
>
<!ELEMENT address -- The address of a restaurant --
- - (street, city, state, zip, phone)
>
<!ELEMENT (street, city, state, zip, phone) -- Parts of an address --
- - (#PCDATA)
>
<!ATTLIST (street, city, state, zip, phone)
location NAME #FIXED "address-item"
>
<!ELEMENT Menu -- The menu for a restaurant --
- - (Menu-item+)
>
<!ATTLIST Menu
location NAME #FIXED "loc-descriptor"
>
<!ELEMENT Menu-item -- An item on the menu --
- - (#PCDATA)
>
<!ATTLIST Menu-item
location NAME #FIXED "loc-bridge"
>
<!ELEMENT (Hours, Cost)
- - (#PCDATA)
>
<!ATTLIST (Hours, Cost)
location NAME #FIXED "loc-descriptor"
>
<!-- End of Restaurant architecture declarations -->
Here's how you relate the restaurant declarations to the location
declarations:
1. Any element type in Restaurant that has the same name as one in the
location architecture is automatically derived from the location form
(e.g., "name")
2. The "location" attribute defines the mapping for all other element types
In this case, every element type in the Restaurant architecture is derived
from a superclass form in the location architecture, but that's not a
necessary requirement. In addition, any subclass architecture or document
can be derived from multiple superclass architectures.
These two architecture declarations define a class hierachy. The syntax
and declarations are formal enough to enable processing and validation of
documents against these declarations. However, their first and foremost
purpose is as *documentation* for humans to read and understand.
Now I want to create a document that describes a restaurant. This document
will be derived from the Restaurant architecture. In an XML environment,
if we assume that there are no declarations for the document, then the
restaurant architecture defines the rules for documents, but, because it's
not used as the real DTD declarations, needn't be processed in order to
parse the document. (But note that the restaurant architectural
declarations *could* be used as a document's DTD declarations if desired,
because the syntax is the same.)
Here's a restaurant document derived exactly from the restaurant architecture:
<?XML 1.0?>
<!DOCTYPE Restaurant SYSTEM "" [
<?IS10744 ArcBase restaurant>
<!NOTATION restaurant PUBLIC "-//Eliot Kimber//NOTATION Restaurant
Architecture//EN">
]>
<restaurant>
<name>Kreiz' Barbeque</name>
<address>
<street>Off the square</street>
<city>Lockhart</city>
<state>Texas</state>
<zip>787xx</zip>
<phone>512-555-1234</phone>
</address>
<menu>
<menu-item>Brisket</menu-item>
<menu-item>Prime rib</menu-item>
<menu-item>Pork chops</menu-item>
</menu>
<hours>8 to 8, closed Sunday</hours>
<cost>Moderate</cost>
</restaurant>
Note that the DTD is null (SYSTEM ""), but the notation declaration
connects the document with the architecture. Thus a human observer or
parser *can* refer to the architecture if desired, but isn't required to.
I can use the restaurant architecture as part of a larger document type
(say a document type for travel info). I can also specialize from it at
the document level. For example, I might have something like this:
<?XML 1.0?>
<!DOCTYPE CityGuide SYSTEM "" [
<?IS10744 ArcBase restaurant location>
<!NOTATION restaurant
PUBLIC "-//Eliot Kimber//NOTATION Restaurant Architecture//EN">
<!NOTATION location
PUBLIC "-//G. Flammia//NOTATION Location Architecture//EN">
]>
<cityguide>
<title>A Guide to Austin And Environs</title>
<places-to-eat>
<para>Austin is known for its barbeque, traditionally smoked over
hickory or mesquite and served dry or with spicy sauce</para>
<bbq-joint restaurant="restaurant">
<name>Kreiz' Barbeque</name>
...
</bbq-joint>
<nuevo-cuisine restaurant="restaurant">
<name>Coyote Cafe</name>
...
</nuevo-cuisine>
</places-to-eat>
<places-to-hear-music>
<bar location="location">
...
</bar>
</places-to-hear-music>
</cityguide>
Here I've done two things:
1. I've specialized from restaurant to further distinguish types of places
to each.
2. I've derived the document from two different architectures (restaurant
and location).
>Can you point me to the relevant specs?
The AFDR Annex of HyTime can be found at
"http://www.ornl.gov/sgml/wg8/hytime/html/clause-A.3.html" (in a few
days--we're setting up the site now). The minimum you need know in order
to make the above work with SP can be found at "http://www.jclark.com".
The key difference between what I've shown above and the mechanism defined
by the AFDR is the use of notation attributes to further configure the use
of architectures in documents and meta-DTDs (architecture declaration
sets). As XML doesn't [yet] have notation attributes, there's no way to
use that aspect of the AFDR. However, you can approximate it as I've shown
above.
Note that the "inheritance" is largely conceptual--this is data, not
programming--so its up to the authors of documents to understand the
semantics of the class hierarchies and use them appropriately. The
declarations enable some validation against the architectures, but it's
ultimately up to humans or down-stream processors to validate the use.
Note also that the "methods" associated with elements *are* programs
(browser objects, style sheet functions, transforms, etc), and so may do
real inheritance. As mentioned before, it probably makes sense in general
to design object-oriented processors that mirror the architecture classes.
If anyone wants to see how the above documents and architectures can be
processed architecturally using SP, I'll work up the examples when I get a
chance (after the holiday).
I'm also preparing a more complete paper on similar uses of architectures
which I'll announce once I've got it up on the ISOGEN Web site.
Cheers,
Eliot
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
</Address>
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list