XSchema Question 1: RDF

Tue Jun 2 16:44:43 BST 1998

> From:   Tim Bray

> RDF is painfully simple, conceptually.  And Lisa is correct in saying that
> the syntax is (IMHO unnecessarily) kinda ugly; I think there are good
> reasons to expect improvement.

My impression of RDF is that it has a superficially appealing model, but
that its current syntax is so bad it will cause just as many problems as it
solves.  I have been told that the syntax was designed the way it is because
that is what the metadata community, whoever they are, demand: the
experience of the markup community is regarded as of secondary interest, I
guess because RDF is regarded as so novel that mankind has never embarked on
anything like it. But I am being too bitchy.

In particular, RDF has the big problem that it is a system of serialization,
not of markup. The difference is this: in a markup language, the data comes
first, and the trick is how to reveal the structures which are of interest:
the organizing principle for the schema is the natural structure of the
data. In a serialization language, the data exists pre-chunked and
pre-labelled: so having a nice manipulatable schema system (e.g. relational
tables) becomes the organizing principle of the schema. So, for example, the
<RDF:RDF> element groups things together: it has no real purpose, and
implements the built-in assumption that data is grouped together. But anyone
who has marked up text knows that interesting data is plonked all over the
place: often duplicated and often partial or wrong.

Hence in SGML the emphasis on external markup of relationships: HyTime, XML
extended pointers, and so on.  The RDF syntax is IMHO completely skewed to
the problem of "how can we make something that will not upset HTML?": it is
starting off crippled.

Syntax is not a trivial issue. RDF should use processing instructions or
just some fixed attributes. I cannot see why they insist on using elements:
they confuse and disguise the non-RDF element structure.

If people are interested in semantic markup, they would be better looking at
Topic Navigation Maps for the immediate term. You can find it at
	http://www.ornl.gov/sgml/wg8/document/1950.htm

TNM can be analysed in terms of the RDF categories: it would make an
excellent external syntax for RDF.

> But it is easy to tell if something can easily be made into RDF.  Here's
> the test: if what you are building can be expressed as a bunch of 3-tuples
>
> (object, propertyname, propertyvalue)
>
> then it's RDF-able.  Otherwise it's not.

I thought Lisa's comment about it being early yet was interesting. In fact,
ontology and metaphysics is one of the great Western traditions, from the
Greeks until now. And AI has been studied for almost 30 years: based on the
history of the study of knowledge representation, there is no reason to
expect any great speed.

Artificial Intelligence diverged into three fields: adaptive systems,
rule-based systems, and knowledge representation. Adaptive systems have
flourished unseen (the "adaptive equaliser" in your modem is a "neural net",
genetic algorithms are used in stock markets), but rule-based systems and
knowledge representation AI work foundered. (I see rule-based made a little
resurgence again recently in the guise of "data mining".) The reason was
because it was so difficult to capture enough knowledge.

I think RDF is an attempt to create a massive world-wide knowledge base, so
that old AI hacks will have something to do with their time. The trouble is
that even if we do have information modeled in RDF, there is every chance
that unless the categories they express are consistant and appropriate to
the AI task intended, the AI will be fed skewed or incomplete information. A
lot of problems are very domain-specific: having an incomplete knowledge
base means that searching on that base can only be done with a degree of
tentativeness. (In Topic Navigation Maps, ISO put in a "Weighting" attribute
to express this kind of fuzziness.)

It is probably the only possible strategy: enrich the data and hope that
somewhere along the line enough interesting information is marked up that
might be useful.  Of course semantic markup would also be useful for
specific in-house AI systems and scholarly work.

Anyway, my gist is that semantic markup itself is not useful unless the
"semantic universe" used for that markup is appropriate to your task. And
even then, unless the markup is rigourous and applied to all the data
consistently, it may not give the results it promises.  Artificial
Intellegence boffins have been working for years and come up with lots of
nice side-benefits, but not delivered on their direct objectives.  AI people
working on this (I used to work for TI supporting AI systems in their dying
days, so I think I have seen the promise and the difficulties) have
constently failed to deliver: I think Apple had quite a long running project
on this. The onus should be on them to provide complete solutions which have
clearly addressed the technical problems (in this case, incompatability of
RDF with simple schema languages) rather than on our goodwill.

This is why I said RDF has a "superficially appealing model". To say that
every relationship can be reduced to those tuples is quite a different thing
to saying that direct representation of those tuples in markup is desirable.
Behind RDF's syntax is the need not to make HTML break and the desire to be
able to stream process data and stick in whatever markup at whatever point
it is needed: this is why they need to create their own in-line schema
declaration syntax rather than use headers or PIs. Why should XML put up
with these sad constraints?

There is also a third problem with the use of RDF to create a global
knowledge network. That is that there are legitimate questions about data
transmission speed and access: good AI searching sucks up as much processing
power as the user will bear: add on to this the transmission delays of
networks and I cannot see the usefulness of such a network. Perhaps for
in-house data. And I suppose you would get domain-specific search engines
pre-fetching and pre-indexing data, like the HTML search engines do now. But
it does mean that there needs to be an awful lot of infrastructure: Peter
and Lesley's Virtual Hyperglossary is one big piece.

> I think the only thing in DTD's that are not trivially RDF-able are
> content models.  They *are* RDF-able, but you have to use some of the
> "Seq" machinery, which I find awkward.  In fact *every* attempt so far
> (the old DSD stuff, XML-Data, etc) to express content models in XML has
> come up verbose and unreadable compared to good ol' 8879 DTD notation.
> I think there's a better way, and want to see what xml-dev can come up
> with. -Tim

I think the trick may be to define many more special purpose schema
languages rather than a single one. For example, a relational database
schema language. Presumably there are proprietary and standard candidates.

Vendors and users would undoubtedly appreciate it if they could continue to
use their familiar schema notation and tools in XML without change. I would
be far happier to allow existing schema languages than either to reinvent
the standard declarations or attempt any grandiose universal schema systems.

If we took this view, then the best approach for XSchema might be to

1) find the major candidate schema languages (markup declarations being the
first)
2) create specific XML versions of each of them, allowing for safe
proprietary extensions (i.e., extensions that do not create any interchange
problems with tools which do not use the extensions). And then
3) note how the schema can be analysed according to RDF's categories. This
is what Dan B. has suggested

  "It shouldn't be unfeasibly hard to represent XSchema ideas in terms of
  assertions framed as RDF triples..."

  (This representation can be done formally inside the DTD for XSchema too,
using architectural forms. I think Elliot K suggested that ages ago.)

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)