About XML document linkage and Schemas.

Mark Birbeck Mark.Birbeck at iedigital.net
Thu Nov 25 23:53:42 GMT 1999


Hello Didier,

Heads or tails?

I have been concerned with exactly this issue recently, and I have
concluded the opposite to you! I wonder if part of the issue relates to
an interpretation of RDF.

RDF and Dublin Core (DC) have been closely associated, however they deal
with very different things:

- DC specifies a common series of tags that might apply to
  a wide range of resources - articles, songs, books, email.
- RDF relates to the specification of statements about
  resources

I mention this partly because of your statement:

> Actually, it seems that yes I can embed a RDF
> fragment to do so.

IMO it is not really right to embed RDF within something. I know people
often embed DC and that's why I'm drawing the distinction:

	<x:article ID="1">
		<dc:title>Cows: Which way up?</dc:title>
		<dc:Creator>Didier PH Martin<cd:Creator>
		<x:subtitle>
			In-depth examination of whether a cow
			is controlled by its head or tail
		</x:subtitle>
		<x:text>
			<x:p>Cows are interesting beasts</x:p>
		</x:text>
	</x:article>

This seems just about alright to me, but probably about as far as you
should go with meta data. (You'll see later that I wouldn't even make
the author a property of an article.) Anything more - keywords,
categories and so on - should be elsewhere. For example:

On my server:

	<rdf:Description about="http://didierserver/article[@ID=1]">
		<dc:subject>Cows, heads, tails</dc:subject>
	</rdf:Description>

and on the server storing classic works in the English language:

	<rdf:Description about="http://didierserver/article[@ID=1]">
		<dc:subject>Bovine</dc:subject>
		<x:rating>Brilliant</x:rating>
	</rdf:Description>

and on the server storing collected works of leading authors:

	<rdf:Description about="http://Didier Himself/">
		<vcf:vCard><vcf:n><vcf:fn>Didier PH
Martin</vcf:fn></vcf:n></vcf:vCard>
		<x:wrote>
			<rdf:Bag>
				<rdf:li
resource="http://didierserver/article[@ID=1]" /
				<rdf:li
resource="http://didierserver/poem[@ID=7]" /
				<rdf:li
resource="http://otherserver/song[@ID=6]" /
			<rdf:Bag>
		</x:wrote>
	</rdf:Description>

I have just implemented this very scenario on a server for publications.
I have used XMLNews-Story to describe the articles themselves. That
gives me the text of the article, its title and a few bits and bobs
about people, countries and so on in the article. Then I use RDF to
refer to this article and in one description I put the author, some
keywords and so on. But the interesting thing is that you can then make
further statements about that meta information. For example, to group
all articles on the same topic - say your article and poem about cows,
and my drawing of one - I can do this:

	<rdf:Description about="http://myserver/subjects/cows">
		<x:related>
			<rdf:Bag>
				<rdf:li
resource="http://didierserver/article[@ID=1]" /
				<rdf:li
resource="http://didierserver/poem[@ID=7]" /
				<rdf:li
resource="http://myserver/picture[@ID=20]" /
			<rdf:Bag>
		</x:related>
	</rdf:Description>

In my system the basic unit is no longer the web page, because my web
pages are made up from the following RDF statements:

  <rdf:Description ID="x">
    <rdf:Bag>
      <rdf:li
       ID="rdf:_1"
       rdf:resource="/article[@ID=3]"
      />
      <rdf:li
       ID="rdf:_2"
       rdf:resource="/meta/article[@ID=3]"
      />
    </rdf:Bag>
  </rdf:Description>

The resulting XML from this - resources get pulled in before the
complete XML is emitted - then gets transformed to HTML. This seems more
logical to me, since the HTML page is only one possible manifestation of
that article.

It raises an interesting problem for indexing and other software,
because it means that the meta information is information about the
article, not the web page. So if you use meta tags like dc:Creator, that
is for the XMLNews-Story article, not the web page since the web page
was 'created' by a machine on the fly. Currently much of what we do is
oriented towards web pages - but actually they are increasingly becoming
a means of viewing something else. In my case I would want an index
server to index my RDF meta documents, not my HTML output.

So, enough of my system - you asked about yours. How does what I have
said relate to it?

You suggest:

> <myInvoice xmlns="http://www.xml.org/Myinvoice">  <------ and 
> that this URI
> would point to a page containing links about this name space as found
> actually in W3C site for their own name spaces.
> ..... some content here......
> <rdf:RDF xmlns="....."> again same thing as above
> ... all the limited meta data set here.....
> </rdf:RDF>
> ... other content here....
> </MyInvoice>

I would suggest that this is not right. For a start you have to define
the schema for myInvoice to be able to contain meta information about
itself (unless you hold with this non-validation stuff that's doing the
rounds, in which case you can have a fish inside your invoice if you
want). But then it's no longer meta information if it's contained in the
invoice. But if you want to say something *about* invoices, then you
shouldn't have to modify the invoice schema.

This is why I said in my previous email that you can't know all
occurrences of meta information about your resource. I believe the
semantic web requires us to allow our resources to go off and have a
life of their own. Like our children we look on them affectionately and
look after them but we can't say who their friends are. (See how Didier
brings out the metaphor again!) For example, say your invoice is part of
a legal dispute - is that a property of the invoice (in which case you
add another element or possible value for an attribute) or is that a
statement *about* the invoice (in which case you use RDF)?

I would therefore suggest you turn the cow inside out:

	<rdf:RDF>
		<rdf:Description about="#1">
			... meta data ...
		</rdf:Description>
		<myInvoice xmlns="...">
			... invoice ...
		</myInvoice>
	</rdf:RDF>

is a completely open and flexible solution. Whether you then use BizTalk
to transport that is up to whatever situation you're in.

To answer you specifically:

> formally ;-) do we embed the document into the meta data and 
> transform the document into the meta data document as a
> fragment or do we, instead, include meta data as a fragment
> in the document. the tail? the head? which one? :-))

I think we embed the data in a document that includes the meta data, and
then embed *that* in the BizTalk transport mechanism (or SOAP, or
whatever you need for a particular job).

Then we turn the cow inside out and have some barbecued ribs.

Best regards,

Mark

Mark Birbeck
Managing Director
x-port.net Ltd.
220 Bon Marché Centre
241-251 Ferndale Road
London
SW9 8BJ
w: http://www.iedigital.net/
t: +44 (171) 501 9502
e: Mark.Birbeck at iedigital.net

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list