X-Schema

Wed Jun 23 09:28:40 BST 1999

A reply to Paul Prescod's reply to me:

> > [XML definitions of schema are better than DTD ones because they
> > allow you to use the same tools as the rest of the time.]
> For programmers. But I asked about usability, not programmability. XML
> instance syntax is certainly easier for programmers.

You might have moved the goalposts here. Aren't programmers entitled to
usability? Also, I assumed that the students you referred to were
programmers. Anyway, the question still remains, why is it better to
learn two syntaxes (XML and DTD) than just one?

> > For example, when we have got a working approach to transclusion
> > with XLinks, I could include your schema inside my schema. 
> DTDs and W3C XML schemas already do this without any special external
> standard. A general transclusion mechanism is not good for 
> schemas anyhow -- you want to define custom import and export rules.

As far as I was aware DTDs only allow you to include another DTD within
them through entity reference. If that is true -and I keep seeing
techniques I wasn't aware of in DTDs, so I by no means assume I am right
- then the richness of XLinks and XPointers is going to be far more
powerful in XML-based schemas. I think in particular the inclusion of a
smaller part of a larger schema will in the long term prove an important
technique. However, far more significant is that creating dynamic
documents will increasingly require creating dynamic DTDs/scehmas, and
that's far easier if you can put the schema on the node (see next
point).

> > With the current XML-Data approach, I can define the schema 
> > to be used on a per-node basis, so including a node also includes
> > the information needed to validate it.
> Sorry, I don't understand what you are saying here. What XML 
> data facility allows me to validate XLinks to transcluded "video
nodes"? 
> Link validation is a completely separate issue, IMHO.

I'm not with you here, Paul. I can't see how that relates to my point,
so I'll give an example (and if it does relate to my point and I have
not understood you, then many apologies). What I was getting at was the
possibility of a schema that allows something like:

<newsStories>
    <story>
        <author>Paul</author>
        <date>1990621</date>
        <body>...</body>
    </story>
    <story>
        <author>Mark</author>
        <date>1990622</date>
        <body>...</body>
    </story>
</newsStories>

and into which anything can be put into 'body'. Now, with normal DTD
techniques you would have to modify the DTD every time you add a new
format for news stories. Alternatively you could allow anything inside
'body', but then how do you validate what's inside 'body'? With the
XML-based schema proposals currently knocking around I can put the
validation on the inserted node. I know it's still controversial, but
lets say we agree to use the namespaces technique from XML-Data. We can
then have the following:

<newsStories xmlns="x-schema:newsschema.xml">
    <story>
        <author>Paul</author>
        <date>1990621</date>
        <body>
            <video:clip xmlns:video="x-schema:videoschema.xml">
                loads of data
            </video:clip>
        </body>
    </story>
    <story>
        <author>Mark</author>
        <date>1990622</date>
        <body>
            <html:text xmlns:html="x-schema:htmlschema.xml">
                loads of text
            </html:text>
        </body>
    </story>
</newsStories>

The only reason that I mentioned transclusion in this context, was
because I was suggesting that if inside the 'body' tag we simply pointed
to a 'news piece' but didn't know what it actually was, it wouldn't
matter. We could use an XLink to retrieve an object (XML document) whose
'type' we didn't know in advance, and still be able to validate it. (I
know I'm asking for trouble here, and someone will come back on me and
say "XML isn't for that", but this allows you to have a sort of OO-style
polymorphism.)

> > This means that you
> > only actually extract the amount of schema you need for the document
> > being exported, and it makes schema very easy to maintain 
> (you just have
> > to maintain the database). 
> 
> You've abstracted away the hard parts of the problem and described how
> easy it is to solve the easy ones.
> 
> [snipped out Paul's point about "the primary purpose of the XML 
> interchange is to help [companies] to bridge that gap [of finding a
> common schema]".]

I wasn't talking about everyone using the same schema, I was more
dealing with the problem of containment. Even if company A and B keep
their different schemas, you could still have a containing schema which
allows a document to be created that contains items from both schemas.
You seem to be talking about XML being useful mainly because it allows
us to 'convert' schema from company A to schema B, by having a common
interface.

> > [snipped my point about sending small parts of a larger schema if
> > the data itself only uses small parts]
> Piecewise schemas are an orthogonal issue to database generation of
> schemas. In most cases it will be simplest to send the whole article
> schema and have it be cached on the other side.

Not sure I agree with you. Imagine sending the headline of an article to
a mobile phone, or a stock price. Or how about if in your article you
quote from my article? All you would need is an XLink to my paragraph,
and the paragraph itself will come with the minimal information you need
to validate it.

> But anyhow, from a validation point of view, a schema sent 
> with the data is not any more useful than the schema you could
> devise by analyzing the data! For a schema to be useful for
> allowing interchange, it must be sent *in advance*.

The schema isn't really sent with the data - only a reference to it. If
I send you another document with a pointer to the same schema then
hopefully it will come from your cache (if that doesn't happen at the
moment, then I doubt that it works any better for DTDs). But further, I
could in theory devise a system that allows me to edit a 'virtual'
document made up of pointers to nodes on numerous servers (say, for
example, editing the news items that I referred to above, that are
contained within the total set of news stories). The schema on each node
would tell me what I can and can't add to each node, which is obviously
more information than can be deduced from the data itself.

Best regards,

Mark Birbeck
Mark.Birbeck at iedigital.net
http://www.iedigital.net/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)