XML-Data: advantages over DTD syntax?
W. Eliot Kimber
eliot at isogen.com
Mon Sep 29 20:21:12 BST 1997
At 01:28 PM 9/29/97 -0400, Jonathan Robie wrote:
> To put this in
>traditional OO terms, Duck inherits data, behavior, and type from Animal. In
>SGML, it can't inherit behavior, but it can inherit data and type.
In fact, you can inherit behavior if your processor is architecture aware
such that you can write rules that will apply the architecture-specific
behavior in the absense of element-specific behavior. This could either be
indirectly through object-oriented processors where the implementing
element-specific objects inherit from architecture-specific objects or
explicitly through scripts that embody the architecture derivation rules,
e.g., something like this in DSSSL (here using a 'query' element rule):
(query (case (arch-form-of (current-node) 'myarch')
(('foo')
(make paragraph ...))
(('bar')
(make sequence ...))))
Behavior is simply processing code associated with types--the only question
is how is the binding done. With SGML, the binding is [almost] always
loose and indirect and architecture-based binding is just another level of
indirection, similar to, if not identical to, the indirection you get by
inheriting methods from supertypes.
>Microsoft's XML-Data allows me to inherit data and type in a manner very
>similar to OO languages. For instance, their description of XML-Data at
>their XML standards page gives the following example:
>
><xml:schema>
> <elementType id="animalFriends">
> <elt href="#pet" occurs="PLUS"/>
> </elementType>
>
> <elementType id="pet">
> <any/>
> <attribute id='name'/>
> <attribute id='owner'/>
> </elementType>
>
> <elementType id="cat" extends="#pet"/>
> <elt href='#kittens'/>
> <attribute id='lives' type='NMTOKEN'/>
> </elementType>
>
> <elementType id="dog" extends="#pet"/>
> <elt href='#puppies'/>
> <attribute id='breed'/>
> </elementType>
><xml:schema>
>
>Now I can use this type declaration to create an animalFriends element,
>which is a list of pets:
>
><animalFriends>
> <cat name="Fluffy" lives='9'/>
> <pet name="Diego"/>
> <dog name="Gromit" owner='Wallace' breed='mutt'/>
></animalFriends>
>
>So the pet hrefs can point to pets, cats, or dogs.
>
>How would I create this schema using architectural forms?
I see a one-level schema hierarchy from which the document in the example
is derived:
superclass animalFriends
contains pet+
superclass pet
contains ANY
attribute owner
attribute name
To duplicate this using architectures, I create a meta-DTD that defines the
two supertypes and a document that derives its element types from the
supertypes.
First the derived document, which declares its derivation from the
architecture (schema):
<!DOCTYPE animalFriends [
<!-- Animal Friends DTD -->
<!NOTATION animalFriends PUBLIC "-//ME//DTD Animal Friends Architecture/EN"
>
<!ATTLIST #NOTATION animalFriends
arcDTD CDATA #FIXED "animalFriends.meta-DTD"
ArcFormA NAME #FIXED "anfriend"
>
<!ENTITY animalFriends.meta-DTD SYSTEM "animalfriends.mtd" >
<!ATTLIST (cat | dog)
anfriend NAME #FIXED "pet"
>
<!-- NOTE: No other declarations necessary when using XML syntax. -->
]>
<animalFriends>
<cat name="Fluffy" lives='9'/>
<pet name="Diego"/>
<dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends>
Now the architectural meta-DTD, which defines the types:
<!-- animalFriends architecture meta-DTD -->
<!ELEMENT animalFriends - - (pet+) >
<!ELEMENT pet ANY >
<!ATTLIST pet
name -- The name of the pet --
CDATA #IMPLIED -- Not clear from example if this is required --
owner -- The name of the pet's owner --
CDATA #IMPLIED
>
<!-- End of animalFriends architectural meta-DTD -->
The relationship of the types in the document to the types in the meta-DTD
is clear and machine processible (because of the architecture notation and
meta-DTD entity). The relationship of the individual elements to their
supertypes is clear, either through the automatic mapping (names in the
document automatically map to the same name in the architecture, e.g.,
'animalFriends' in the document maps to 'animalFriends' in the meta-DTD) or
through the explicit mapping as for the types cat and dog. The 'extends'
semantic is inherent in architectural derivation. The architecture conveys
no less information than the example and takes about the same amount of
characters in this case (the verbosity of the XML-Data syntax offset by the
need for the architecture notation and entity declaration in the document).
The architecture approach requires no specialized processors in order to
process the document by architecture-unaware processors and
architecture-aware processing can be added easily through either ad-hoc
means in style sheets or transforms or using more complete architecture
engines (e.g., SP, GroveMinder, etc.).
Note that neither the XML-Data nor the architectural meta-DTD are complete
definitions of the schema--you still need human-understandable definitions
of all the parts (what is a "pet"? What are the rules for pet names? What
are the rules for owner names? What, if any, is the significance of pet
element content? etc.). You also need to define the expected behavior for
the types in various contexts: formatting, transformation, online display,
etc. Neither the XML-Data nor the architecture formalism will or can
provide these--they must be provided by other means, mostly
non-standardized and relying heavily on prose to communicate ideas to
humans, not processing to computers.
The only really important part of the schema discussion is how is a schema
associated with its documentation and definitions and how are things
associated with that schema. That's why the architecture mechanism
requires that you declare the notation for the architecture--that is the
pointer to the authoritative definition of what the architecture rules are.
The meta-DTD for the architecture is just a convenience that makes it
easier to do processing and validation, but the presence of it doesn't give
you that much and the lack of it doesn't preclude doing architecture-based
processing. The same will be true of any other formal syntax for defining
the meta-syntax rules for documents. At least architectures use an
existing syntax that is well understood by all SGML tools.
Given that most XML tools will need to be able to deal with DTDs anyway, I
can see no compelling reason in the short term to define an alternative
syntax for DTDs. Rethinking how document schemas are created and managed
over the long term needs doing, now doubt, but that is a project that will
take years of careful study and thought and must be done in conjunction
with a major revision to SGML, one in which many different ideas and
requirements can be brought to bear.
In my opinion, none of the name-space requirements and none of the
DTD-editing requirements require a change to existing mechanisms in order
to be satisfied in a reasonable way. Given that, there can be no good
reason for trying to reinvent the DTD mechanism at this time and trying to
do so is a waste of time that is better spent on more pressing issues.
Certainly people are free to invent whatever document types they want for
representing schemas, but to suggest that any such definition should be
used as standard within XML or SGML is premature, unwise, and unwarranted.
If Microsoft (or anybody else) wants to build tools to support such a
system and see if people will use or buy them, let them do so. Let the
marketplace decide. But this is not an area of SGML or XML for which the
standards need to change at this time and we should not attempt to change
them.
Cheers,
E.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list