Some questions
Dan Brickley
Daniel.Brickley at bristol.ac.uk
Thu Dec 2 01:10:09 GMT 1999
On Wed, 1 Dec 1999, Eve L. Maler wrote:
> At 03:46 PM 12/1/99 -0800, Tim Bray wrote:
> >Because the same data structures and usage patterns keep coming back across
> >wide ranges of metadata applications, even though the world isn't about
> >to agree on common vocabularies. So there are huge gains to be had from
> >a common data model and transfer syntax. -Tim
>
> Not that I don't respect RDF's power, but personally, I think the key *is*
> common vocabularies. We may have to start small, and they may just be hub
> formats that get mapped to/from a lot, but agreeing on semantics is the
> pill that has to be swallowed. Even RDF depends on this, particularly on
> an open system such as the Web where you can't really control or influence
> the habits of content creators. If you want to indicate that you are the
> author of a certain page, at the very least you have to refer to a widely
> understood "author" semantic in order for author-criterion searching to be
> of any use to your audience. Whether it's an RDF property or a well-known
> namespace or whatever doesn't seem to matter as much.
I don't disagree with any of this except the last claim; both matter IMHO.
What really matters above all is the use of unique identifiers (in Web
context, URIs) both for the concepts/objects defined in a vocabulary and
those named in our instance data. There is very little to RDF apart from this
idea, ie. that simple stilted 3-part statements of the form:
{peter, likes, mary}
{peter, age, 7}
{mary, livesIn, London}
{peter, faveColor, red}
...are more useful when disambiguated with unique identifiers. Which
'peter', which 'London' and so forth.
We pay the price in verbosity, but when we move to URIs
(eg. urn:xmeta:cities:canada:London or http://xmlns.com/cities/LondonUK)
for these silly stilted sentences, there's another huge pay off: data
aggregation. Since the RDF information model is just stilted 3-part
sentences mostly built from URIs, we can aggregate two RDF data graphs
by joining nodes that share common identifiers.
if one piece of data tells us:
(I'm switching to an ascii-art labelled graph representation here)
[mary] --livesIn--> [London]
[mary] --age--> "9"
[peter] --livesNextDoorTo-->[mary]
and something else (say the CIA world fact book or X500)
informs us that...
[London] --numCommunists--> "10,000"
[London] --situatedIn--> [Canada]
we can simply[*] join these two graphs on the common node London (or,
rather, the unambiguous version ie [urn:xmeta:cities:canada:London].
Whether this is 'data' or 'metadata' is of no interest to me
whatsoever. Using URIs for Web data is just downright handy.
We can take heaps of silly 3-part sentences from anywhere (that we
trust...) on the Web, pour them into a common database and get
something mostly intelligible.
Here's a bald claim:
Aggregating unanticipated RDF data graphs into a useful common
data structure is a feasible task; doing the same with unanticipated
non-RDF XML data is, in the general case, much harder.
Maybe I'm wrong; perhaps someone has an algorithm for general
purpose DOM-merging or SAX-stream aggregation that doesn't mangle
data. If anyone has seen such a thing please post the URL...
(BTW I'm making loose use of undefined terms here. By 'unanticipated'
I'm talking about a processor encountering instance data in a previously
unseen vocabulary. By 'aggregation' I mean joining together relevant
facts (or would-be facts) scattered across various XML documents and
document-parts such that applications can make use of the pooled
information.)
Let me emphasise that I'm not focussing on the use of RDF syntax
here; that doesn't matter. The key thing IMHO to support Web
data aggregation is for interchanged data to have a common URI-based
graph interpretation. We can do that with XSL or (hopefully)
using annotations in XML Schemata or annotations on good old fashioned
DTDs. RDF is URIs URIs URIs and not a lot else. I'm willing to be
persuaded that the syntax needs more thought, but the value of using
unique identifers in Web data interchange seems pretty
uncontroversial...
Dan
[*] I'm glossing over some issues here (eg. relating to
knowledge of cardinality/occurrence constraints to aid data
aggregation apps); aggregation in RDF is still hard to do right, but is
vastly easier than for arbitrary XML content.
--
daniel.brickley at bristol.ac.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list