Some questions

Thu Dec 2 01:10:09 GMT 1999

On Wed, 1 Dec 1999, Eve L. Maler wrote:

> At 03:46 PM 12/1/99 -0800, Tim Bray wrote:
> >Because the same data structures and usage patterns keep coming back across
> >wide ranges of metadata applications, even though the world isn't about
> >to agree on common vocabularies.  So there are huge gains to be had from
> >a common data model and transfer syntax. -Tim
> 
> Not that I don't respect RDF's power, but personally, I think the key *is* 
> common vocabularies.	  We may have to start small, and they may just be hub 
> formats that get mapped to/from a lot, but agreeing on semantics is the 
> pill that has to be swallowed.  Even RDF depends on this, particularly on 
> an open system such as the Web where you can't really control or influence 
> the habits of content creators.  If you want to indicate that you are the 
> author of a certain page, at the very least you have to refer to a widely 
> understood "author" semantic in order for author-criterion searching to be 
> of any use to your audience.  Whether it's an RDF property or a well-known 
> namespace or whatever doesn't seem to matter as much.

I don't disagree with any of this except the last claim; both matter IMHO.

What really matters above all is the use of unique identifiers (in Web 
context, URIs) both for the concepts/objects defined in a vocabulary and
those named in our instance data. There is very little to RDF apart from this
idea, ie. that simple stilted 3-part statements of the form:

	{peter, likes, mary}
	{peter, age, 7}
	{mary, livesIn, London}
	{peter, faveColor, red}

...are more useful when disambiguated with unique identifiers. Which
'peter', which 'London' and so forth.

We pay the price in verbosity, but when we move to URIs
(eg. urn:xmeta:cities:canada:London or http://xmlns.com/cities/LondonUK)
for these silly stilted sentences, there's another huge pay off: data
aggregation. Since the RDF information model is just stilted 3-part
sentences mostly built from URIs, we can aggregate two RDF data graphs
by joining nodes that share common identifiers.

if one piece of data tells us:

(I'm switching to an ascii-art labelled graph representation here)

	[mary] --livesIn--> [London]
	[mary] --age--> "9"
	[peter] --livesNextDoorTo-->[mary]

and something else (say the CIA world fact book or X500) 
informs us that...

	[London] --numCommunists--> "10,000"
	[London] --situatedIn--> [Canada]

we can simply[*] join these two graphs on the common node London (or,
rather, the unambiguous version ie [urn:xmeta:cities:canada:London].

Whether this is 'data' or 'metadata' is of no interest to me
whatsoever. Using URIs for Web data is just downright handy. 
We can take heaps of silly 3-part sentences from anywhere (that we
trust...) on the Web, pour them into a common database and get
something mostly intelligible.  

Here's a bald claim:

	Aggregating unanticipated RDF data graphs into a useful common 
	data structure is a feasible task; doing the same with unanticipated
	non-RDF XML data is, in the general case, much harder.

Maybe I'm wrong; perhaps someone has an algorithm for general 
purpose DOM-merging or SAX-stream aggregation that doesn't mangle
data. If anyone has seen such a thing please post the URL...

(BTW I'm making loose use of undefined terms here. By 'unanticipated'
I'm talking about a processor encountering instance data in a previously
unseen vocabulary. By 'aggregation' I mean joining together relevant
facts (or would-be facts) scattered across various XML documents and
document-parts such that applications can make use of the pooled
information.)

Let me emphasise that I'm not focussing on the use of RDF syntax
here; that doesn't matter. The key thing IMHO to support Web 
data aggregation is for interchanged data to have a common URI-based
graph interpretation. We can do that with XSL or (hopefully) 
using annotations in XML Schemata or annotations on good old fashioned
DTDs. RDF is URIs URIs URIs and not a lot else. I'm willing to be
persuaded that the syntax needs more thought, but the value of using
unique identifers in Web data interchange seems pretty
uncontroversial...

Dan

[*] I'm glossing over some issues here (eg. relating to
knowledge of cardinality/occurrence constraints to aid data
aggregation apps); aggregation in RDF is still hard to do right, but is
vastly easier than for arbitrary XML content.

--
daniel.brickley at bristol.ac.uk

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)