Generating typed code from DTDs, why not?

Tue Mar 16 02:42:20 GMT 1999

Luke Gorrie wrote:
> 
> I'm pretty new to XML, but as I've poked around I've observed what
> seem to be some strange things.  XML parsers all seem to provide
> interfaces which ignore the static structure information provided by
> DTDs and rely on "one fits all" interfaces to elements, in stark
> contrast to the conventions of statically typed languages.

The reason this is the case is because most document data depends on many
heterogenous lists:

<P>This is some string data, <EMPH>This is a sub-element (new node
type)</EMPH>, this is another <STRONG>sub-element</STRONG>, and this is a
<?PROCESSING-INSTRUCTION ?>.</P>

A visitor pattern would severly complicate the flow of control in this
case.

There is a subset of XML processing where document structures are
predictable enough for static type checking to be useful. I suspect that
when the W3C schema working group completes its work it will be common to
derive database schemas and IDL from the document schemas. But there are
many applications where that stuff will just get in the way.

> For instance, the first thing I played with in XML was SAX using
> Python.  I was impressed by how easily it worked and how naturally it
> fit in with a dynamically typed language like python. 

You are a wise man.

> look at the Java interface and found that it was just the same, which
> I thought very odd!  The natural mapping for SAX onto Java, to get the
> (significant) benefits of static typing, would be to generate a
> Visitor interface.  The Visitor interface would have a method for
> "visiting" each type of element in the document, and the argument to
> this method would be an object which presents the element contents
> through typed accessor methods.  At least, that's how it looks to me.

This turns out to be a fairly painful way of doing text processing. For
instance it ignores the fact that two elements can share a name but have
radically different behaviour because of their contexts. Consider the
title of a document and the title of a section within the document.

So do you need one visitor per context?

> In the case of DOM, again generating typed accessor code would provide
> these great benefits.  People could use a DTD (or similar) as the
> definition language for their abstract data types, and generate
> DOM-compliant classes which they can both use "natively" in their
> language and also manipulate as part of a genuine DOM tree at the same
> time.

On the one hand you have the goal of a language: improving on the wire
interoperability, minimizing redundancy, perhaps editing efficiency.

On the other hand you have the goals of an API: improving runtime
interoperability, maximizing ease of use (perhaps by providing many
redundant pointers), maximinzing runtime efficiency.

These can often conflict. If HTML were designed for programming
convenience and not for authoring convenience it would be quite different.

I don't want to say your idea isn't useful: I've worked on something
similar myself in the past and it is exciting -- but it can't replace
dynamically typed processing. It can only augment it in certain
situations. I think that the reason this hasn't got more research is
because the current method works and it works for all types of XML
processing.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"The culture we are living in becomes an ever-wider sewer."
	- Paul Weyrich, of the "Moral Majority"

"Only someone attached to an irrecoverable past, and therefore hostile 
to change as such, could react so negatively toward a culture that 
is doing all right by any reasonable measure."  
	- http://www.salonmagazine.com/col/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)