XML trade off 1 - DTD vs XML Schema

Mark Birbeck Mark.Birbeck at iedigital.net
Tue Aug 3 19:59:58 BST 1999


Rick,

I think much of what you say is true, but I remain a bigger fan of XML
schemas than you obviously are!

First on some of your specific points:

- True, if you have a large document, you have to send a large schema.
However, I was responding to the point about 1k doc's and 100k schemas
and saying this was unnecessary, so don't shift the goal-posts.

Now to more general points. My argument is based on a recognition that
what elements of a schema I need depends on what is the purpose of the
XML document to which the schema refers. For example, if I have two
servers that sit all night passing data to each other, the purpose of
the schema is simply to validate the integrity of the data passed. If
one server sends:

<statusReport>
  <time>1201</time>
  <station>123</station>
  <status>56</status>
</statusReport>

why bother sending more schema info than the name of the root document
and the two children that it has? Even if the node statusReport can have
more children, why bother passing them in the schema? (My system doesn't
yet do this, though.) If the next report is:

<statusReport>
  <time>1202</time>
  <station>123</station>
  <status>56</status>
</statusReport>

then the receiving server doesn't even need to ask for the schema again.

On the other hand, if I have an editor that allows me to receive a large
XML document and modify it, then I need to know what children can be
added at various points in the tree. These children may not yet exist in
the receiving document, so you would appear to need - as you say - the
entire schema. However, even then it is not so obvious you need all the
schema. I am working on a system that allows the editor to again request
only the parts of the schema it needs. Grand-children of the node the
user is currently examining are actually depicted with XLinks since we
don't need to know what they are exactly if the user doesn't go there.
For example, if a user edits invoice data in a tree view, but doesn't
open the nodes for the customer information such as address, then do we
need to bother retrieving the schema information for the address? With
the XLink we can get it when we need it.

So, my vision is of systems that pass parts of schemas to each other as
they need them. My argument is simply that this is a lot easier to do in
XML (I never said it couldn't be done with DTDs, as you seem to think),
since we are already writing systems that handle the distribution,
archiving, searching, editing and so on, of XML. For this same reason, I
must say I am surprised that fans of XML can be looking to use non-XML
syntaxes to define any type of data, unless totally
unavoidable/impractical.

I didn't understand your point about HTML 4, but all I can say is that
different schema for different parts of the document seems best solved
by linking the schema to namespaces. The solution used in IE5 - where
the namespace definition can be used to point to a schema for that node
- allows documents to effectively contain other documents. DTDs as they
stand cannot do this - although that is not to say that someone won't
propose some mess with processing instructions to 'switch schema' at
various stages in a document - and adding a DOM definition won't help.
(I don't know if I said this before, but I remember one of the biggest
confusions with namespaces, when that debate raged on here, was that
there was no validation going on. People kept thinking that there needed
to be a document at the end of a namespace URI, and that it would be
used to validate. Even though they were wrong, they thought that because
it has a nice intuitive feel to it.)

Finally, no-one has come back on my point from previous emails, that if
you want to be able to index and manipulate the massive amount of XML
data that will exist in coming years, often using non-standard schemas,
you will need to be able to manipulate the meta-meta-data. And what
better tool to use to define this than good old XML?

Best regards,

Mark



> -----Original Message-----
> From: Rick Jelliffe 
> Sent: 30 July 1999 05:08
> To: xml-dev at ic.ac.uk
> Subject: Re: XML trade off 1 - DTD vs XML Schema
> 
> 
> 
> From: Mark Birbeck <Mark.Birbeck at iedigital.net>
> 
> >In previous discussions on DTDs versus XML approaches to 
> schemas I have
> >argued that this ability to dynamically generate only enough of the
> >schema as you need, (and the ability to cope with namespaces, which I
> >haven't covered here) is my major reason for preferring XML schemas
> over
> >DTDs
> >
> >Does this confuse or clarify the point, Rick? :-)
> 
> On the other hand:
> 
> * Whether a schema is in one place or many places, you still need to
> download all of it if your document has all of those elements;
> 
> * Under your system, all possible child element types are downloaded.
> If your document starts at the root, you will download all the schema
> anyway.
> 
> * XML Namespaces raises the possibility that elements from
> different namespace can have content models that essentially
> are the same, but which require separate schemas: for example,
> one schema uses HTML 4 strict paragraphs and another
> schema uses HTML transitional, or whatever. I think it is
> important to have a commonly accepted basic vocabulary
> to prevent this: HTML is a good start, but it is not managable
> under any schema proposal I have seen yet.
> 
> So your system relies on each individual schema being small,
> so that no fluff gets sent, and that people use well-known
> content models rather than make their own.
> 
> In any case, I do not see why your system does not apply equally
> to DTDs: what difference does the syntax make?  It seems to me
> that some amount of the "you cannot do this with DTDs" argument
> would vanish if we bothered to define a DOM for DTDs,
> with XML Schemas a transformation and serialization of that
> DOM. When the W3C spec-makers say "you cannot do this
> with DTDs" that only would require a DOM mapping to be
> specified, they are really saying "you cannot do this with W3C
> specifications" not because of the intrinsic capabilities of
> DTDs syntax. A little misleading.
> 
> Downloading branches of trees does not look either syntax-
> dependent or semantics dependent: you don't need instance
> syntax or XML Schema semantics. You just need a tree API
> (e.g. DOM) and a serializer in whatever syntax.
> 
> Rick Jelliffe
> 
> 
> xml-dev: A list for W3C XML Developers. To post, 
mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list