XML trade off 1 - DTD vs XML Schema

Mark Birbeck Mark.Birbeck at iedigital.net
Wed Jul 28 10:41:19 BST 1999


[Sorry the following is quite long.]

Rick Jelliffe wrote:
> So does the recipient system look through the document and then
> request a schema server for the appropriate minimal schema to be
> generated and sent, or does the server already have a separate schema
> generated for each instance?

Mmm ... neither really. You only (mostly, but we'll keep it simple for
now) need the schema for the root node, and you can generate it
automatically (recursively). All I have done is written a routine that
'digs out' the schema definition for a node and all its possible
children and attributes. This means that I can request a schema at any
level of detail. The same principle is applied to my data. A few
scenarios might illustrate this:

1. A server or user requests an article from my XML server:

	http://view.IED-IED.ied-support.net/documents ...
          /article[@ArticleType="interview"]

and the following is returned (note the schema URL):

<Article
 Title="This is an article"
 ArticleType="interview"
 xmlns="x-schema:http://view.ied-ied.ied-support.net ...
     /schema/Article"
>
 <ArticleText>
  <Para>
   Try visiting
    <ExternalSite>this lovely site</ExternalSite>
   . You'll like it.
  </Para>
  <Para>More text</Para>
 </ArticleText>
</Article>

Now, when the parser follows the URL for the schema my server dishes up:

<Schema
 xmlns="urn:schemas-microsoft-com:xml-data"
 name="article"
 xmlns:dt="urn:schemas-microsoft-com:datatypes"
>
 <AttributeType name="Title"/>
 <AttributeType
  name="ArticleType"
  dt:type="enumeration"
  dt:values="Interview Article BookReview"
 />
 <ElementType name="ExternalSite" content="textOnly"/>
 <ElementType name="Para" content="mixed" order="many">
  <element type="ExternalSite"/>
 </ElementType>
 <ElementType name="ArticleText" content="eltOnly">
  <element type="Para" minOccurs="1" maxOccurs="*"/>
 </ElementType>
 <ElementType
  name="Article"
  content="eltOnly"
  model="closed"
 >
  <attribute type="Title" required="no"/>
  <attribute type="ArticleType" required="no"/>
  <element type="ArticleText" minOccurs="1" maxOccurs="1"/>
 </ElementType>
</Schema>

2. If I was to now request just the first paragraph of the article:

	http://view.IED-IED.ied-support.net/documents ...
          /article[@ArticleType="interview"]/*/para[1]

I would get back:

<Para
 xmlns="x-schema:http://view.ied-ied.ied-support.net ...
     /schema/Para"
>
 Try visiting
 <ExternalSite>this lovely site</ExternalSite>
 . You'll like it.
</Para>

All my routine has to do to construct the URL for the schema, is to take
the general schema area for the same server the data is from and append
the name of the element type. This could obviously be modified so that
there is a separate 'schema server', for example if there was a
centralised repository like BizTalk, or whatever. Note that the returned
XML is not a 'fragment', as mentioned in previous emails, but correctly
formed documents. (I wrote a long piece ages ago about why I preferred
to think of XML documents as units of transfer, and distinct from
'documents' as we normally conceive them.) However, there are situations
where we do wrap this XML document in a fragment container, for example
if dealing with an editor when we would need to know where to put the
data back to if it had been changed.

Anyway, when the parser follows the URL for the schema all the server
need dish up now is: 

<Schema
 xmlns="urn:schemas-microsoft-com:xml-data"
 name="para"
 xmlns:dt="urn:schemas-microsoft-com:datatypes"
>
 <ElementType name="ExternalSite" content="textOnly"/>
 <ElementType name="Para" content="mixed" order="many">
  <element type="ExternalSite"/>
 </ElementType>
</Schema>

In other words, as I said in my previous contribution, why would you
bother delivering 100k of schema for a 1k document? The only reason
people are thinking that they would do this is because they are still
thinking that 1 document = 1 file. File systems are uncool, man ...
databases are where it's at daddy-o. (There's a Timothy Leary revival
this side of the water so I'm just practising.) For example, with my
database-driven approach I could easily extend the functionality to
allow:

    http://view.ied-ied.ied-support.net/schema/1.2/Para

In previous discussions on DTDs versus XML approaches to schemas I have
argued that this ability to dynamically generate only enough of the
schema as you need, (and the ability to cope with namespaces, which I
haven't covered here) is my major reason for preferring XML schemas over
DTDs. I'd even go further and predict that it is one of the major reason
that XML schemas will win out over DTDs (the other is the ability to mix
schema).

Does this confuse or clarify the point, Rick? :-)

Best regards,

Mark Birbeck
http://www.iedigital.net/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list