XML-Data: advantages over DTD syntax?

Wed Oct 1 05:03:04 BST 1997

Murray Altheim wrote:

> I've been (honestly) trying to give this XML markup schema idea a fair shake,
> and while I wouldn't enjoy writing the DTD (based on the examples, I think
> the syntax is verbose and ugly) I do think this does come down to an
> advantage, at least as far as my own XML parser experiments go.

I agree.  It isn't that a single parser processes it, it is what 
can be done with it.  UP FRONT:  DTDs or Schemas.  I think these 
are different animals to do different kinds of design.  Past experiences 
with design efforts like HyTime, MID, etc. left me and still 
leave me puzzling about what should be defined in the markup 
language(s) and what is best done in the objects.

> My parser builds a Java Vector array of what I call 'ContentObjects', and
> I've enumerated types for the various types of content objects. Using the
> same parser, I would simply add several more enumerated types (for element
> declaration, attlist declaration, etc.) to the list and let the thing
> attack a DTD'ed document instance. That would be obviously easier than
> writing the parser to understand SGML markup declarations.

Ok.  Having never written a parser, I believe you.  Creating objects 
using the XML/SGML markup as an interpreted source for properties 
is what XML should *standardize* IMO.  What would be very interesting 
to hear is opinions on how much and which parts of the document 
framework properties should be expressible in the XML, and what parts 
should be in other notations.  For example, we have to look at how 
scripting is to be done since despite SGML's resistance to procedural 
languages in SGML, internal scripting is a part of the modern document
instance.
Of course, the contenders are ECMAScript and Java (IMHO) because 
other notations within the framework support those languages as internal 
nodes. 

> But from that point on, figuring out how the document model is structured
> seems pretty much the same, just a different approach on getting to the
> declarations into the Vector array. I don't see any other particular
> advantages to the syntax, and as I said earlier, it seems harder to read
> (to me) and certainly more verbose.

Also agreed.  I guess I have problems with the ideas of using 
the instance syntax because I think of that as data (old SGML habit) 
and I think of a DTD as expressing automata.  I understand how 
they have adapted that as attributes, but I don't like that model.
To me, the element types are active.  I am comfortable with 
the current DTD syntax.

> By and large though I agree with you. DTDs are hard enough to read now;
> adding all that extra markup cruft seems a step backward. It requires the
> reader to compose the content model in their head based on interpretation
> of the schema markup, which relies a great deal on whitespace (!) IMO.

That's the difference, perhaps.  In some sense, the schema approach 
is an exercise in entity/attribute modeling a la a relational
background. 
A *mythical* SGML designer sees a document from which he is abstracting
and 
to which he is adding structure in terms of Type/attribute(s)OfType 
that occur at some frequency and in some order.  While some have 
certainly been able to express that relationally, it isn't the 
conceptual model I prefer for human-digestible text.

That said, if it is enabling a more object-oriented capable 
syntax, then these are two ways to create markup for the 
same information.  I have no problems with that.  

len

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)