Why not RDF for XSchema

Mon Jun 1 23:46:55 BST 1998

Ok here we go:  crash course in RDF -- what i learned while
participating in the RDF Schema WG 
(DISCLAIMER IN ADVANCE -- no secrets divulged here)

But first, an introduction:

Simon said :-)

> I'm still conducting my own full-scale investigation of RDF.  I'm very
> concerned about several issues:
> 
> 1) I don't want this discussion limited to the relatively small number of
> people who have figured out RDF.

First of all I need to say that I'm very much in favor of RDF and what
it proposes to eventually accomplish. 

(and I know I'm gonna catch some flack for some of my opinions below -
but I'm just trying to save the rest of you time -- no one has the time
to bone up on RDF real quick -- not even over the weekend.   I'm not
trying to be critical as much as trying to explain my experience and
understanding of it and/or be corrected and enlightened about whatever I
may be confused about - (thanks in advance tim and andrew)

So here we go:

> 
> 2) I don't want XSchema to need to make drastic changes if RDF moves suddenly
> - it isn't complete. (From what little I've heard, this doesn't sound likely.)
> 

I number my points from here...

1) actually, it is highly likely...almost guaranteed IMHO (you're gonna
see me saying that a lot in this mailing ...)

 Until the syntax solidifies -- and don't hold your breath for that to
happen anytime soon -- and really I think it's a good thing to wait and
let RDF evolve as we figure out what we need it for -- sure beats
pumping something out in a timely fashion that ultimately is neither
extensible or useful -- and organizing bookmarks -- I don't care if they
are the smartest bookmarks you ever did see -- doesn't count in my book
as an implementation of an extensible semantically-enabled data content
model -- complete with meaning and understanding...an ideal, granted,
but we must reach for the stars, yes?)

And nothing against either WG -- 
others have made comments about how slow the RDF progress is moving, but
I doubt if the people making these comments realize the huge task that
the RDF syntax and schema groups have before them:  they have been
presented with the ambitious task of integrating numerous existing
semantic content models into a single unified conceptual syntax for
expressing semantics in a way that is ultimately language-and
implementation-independent.

Another confusing item, particularly when using RDF for writing schemas,
is that the various existing KR communities use many of the SAME WORDS
when they mean ENTIRELY different things, which is poetically ironic
considering we are trying to define a model for conveying meaning in the
first place.

For starters, there are something like 8 or more meanings just for the
word "schema".  Then we've got "inheritance".   And what about classes
and types -- which word do we use even though we are just trying to pick
one
--so the people that had a solid understanding of the properties
involved
say "oh, type, class, what does it matter?  We know what we are talking
about."   and for some people, that is certainly true.  But life is just
so much easier if you pick one so there is no ambiguity.  

Ambiguity will inevitably manifest -- all we can do is systematically
eliminate it as soon as such ambiguity recognized, yes?(especially
syntactic ambiguity, which can be immediately remedied :-)  To go on to
the next layer of semantic understanding without fully comprehending the
meaning of the "scope" (right word?) of the concepts underneath -- it's
like getting back to the foundation of a house or getting around to
grounding your electricity after you're all moved in. (insert metaphor
for lacking a corelevel structure here)

Anyway, taking all of the above into consideration (even the stuff Tim
is going to correct me about ;-)  Using RDF for anything other than an
example of an example of an RDF Schema of an XSchema would be downright
irrational. 

I'm not going to emphasize the synctactical instability except to say
that it is unstable - which like i said before - is a positive and
correct path of action at this ever so formative stage of the semantic
conceptual game.

2) In many ways, it is almost a backwards concept to consider using RDF
to define XML Schemas. This is because RDF is unique, compared to other
XML implementations because, for one thing, it is *not* an XML
implementation per se, although it CAN be implemented in XML (and in my
opinion, should be and will be if it is ever going to be useful).  

So to talk about defining an XML-based application something or other
using RDF doesn't mean a whole lot, and because RDF itself is a
conceptual model, without a specifically-defined syntax OR model at this
point...(I realize that this flexibility was designed to be one of RDF's
"features", but at this early stage of its core development it simply
complicates just about every aspect of actually implementing it.)

EBNF (sp) notation is what's used in the spec (syntax). UML has been
attempted for use in the graphical representation of its data
structures, but it got messy quickly...

RDF Schemas were going to used XML-compliant syntax in the beginnings of
the WG, but we soon fell into a trap attempting to "wing it" 
(translation:  we made guha and andrew do all the work :-)

One could say that XML itself is unstable considering the (admittedly
few...but still...) parts of it that have still yet to be defined, or
are defined in an experimental manner (such as namespaces). 
Nevertheless, it's the best we got:  RDF syntax will be most useful when
it is strictly defined in XML-complaint syntax -- especially if we wish
for our RDF and XML implementations to complement each other without
restricting the expressiveness or interoperability of the other -- on my
own  site or anyone elses. IMHO

3)

 ...and this is a personal beef of mine

The current RDF WD spends almost more than half of its "ink" defining
numerous ways for authors to abbreviate their RDF syntax.  Not only are
the various varieties of abbreviation equally confusing (especially to
those that are trying to initially learn RDF for the first time)
but there doesn't seem to be really anything to gain in doing so,
especially if we are in agreement that, ultimately (in a perfectly
structured, gui interfaced world :-)  Authoring tools will be generating
the RDF after the functionality is determined.  So we're not going to
save any time or effort using abbreviated syntax....and we're going to
screw up the interoperability of the data used in our RDF applications
if everybody is abbreviating all over the place and those abbreviations
are not clearly specified and accessible somewhere where we can find it
when it's time to let our data mingle -- thus annihilating ANY benefit
to abbreviating our rdf syntax

Not to mention the obvious counterproductive nature of abbreviating a
syntax that IS NOT FINISHED BEING COMPLETELY DEFINED :-)     

I understand how, at first glance, RDF seems to be SCREAMING to be
abbreviated due to its often redundant and seemingly unnessarily verbose
syntax -- verbose syntax that, at first glance now (she said cautiously)
RDF doesn't seem to be really doing anything with its verbose syntax.  
Its syntax insists on being immediately complex, without providing a
solid structure from which we can extend -- and it doesn't always map
very well
(algorithmically is what i think i mean, so someone CAN go from an RDF
schema to a DTD -- not that they would want to, but it should someone
want to, doing so should be a syntactical exercise, and not a painful
one...
--a systematic one!

Frankly, if my implementations aren't completely interoperable with the
data of the rest of the free world, they are of no use to me -- another
potentially casualty of using lazy or inconsistent abbreviations.  

4) The existence of the above mentioned syntax ambiguity makes it hard
to construct the structure of its conceptual model -- and since the
nature of RDF's design is precisely to provide a means for constructing
a conceptual data model from which meaning can be derived (if my
understanding is correct) -- there quickly becomes a sort of chicken and
egg thing -- but using only a wing and a yolk (a wing and a prayer :-)

Especially to those (like myself perhaps) currently unskilled in the
field of semantics and knowledge representation.  (It's kinda like a
rosetta stone -- a puzzle -- except that RDF has yet to ever exist
completely in one piece -- so one can burn a lot of time looking for
pieces that you may NEVER find and often you realize you don't even need
to find what you were looking for...)

5)Datatypes are a rather "funky" (maybe "tricky" or even "complex" is
better) right now RDF has its own set of RDF-centric primitive datatypes
(why the set used for XML -- as defined in XML Data -- couldn't have
simply been adopted outright, I'll never know) RDF Schema too, in
theory, is scheduled to have ITS own RDF Schema-specific datatypes as
well as a set of primitive datatypes, which are currently not provided,
last I checked) 

This issue is another, seemingly fundamental "core" requirement that the
powers-that-be writing the spec somehow felt we could get back to,
later, while continuing to moving forward despite the structural
ambiguity.

I figure as long as there is a means for providing a mechanism to define
whatever datatypes you want whatever kind using whatever structure to
define them -- as long as they can be referenced and parsed from within
an RDF application -- what does it matter? (And I still feel this way.)

But others -- people that know more than me about structuring semantics
and sets of conditional rules that will most likely be accessed via one
or more inferencing engines -- whose expressive syntax and semantic,
conceptual structure WILL most likely be language and application and
implementation and maybe even domain-specific and potentially fussy and
inflexible-- Veterans tell me that these things can sometimes be handled
more effectively using a set of built-in complex
sorta-relational/conceptual datatypes)  

I still say give me a way to define these externally and we're in
business but really, when it comes to this stuff -- to say my experience
with defining semantic conceptual complex datatype structures -- defined
externally or otherwise -- quite the understatement -- bordering on an
embellishment!  But I do understand that creating "built-in" datatypes
into an ambiguious conceptual model would seem to make the uses for that
datatype equally ambiguious (ie equally useless)  

6) Another issue is that RDF and RDF Schema, at this point in time,
(unless things have changed recently) have unique namespace prefixes --
even when addressing identical conceptual properties that the two
share.  (i suspect that considering this a "flaw" is perhaps required in
some way -- making this item merely a case of my schema naivity rearing
its ugly head)

But at one point I researched the alternatives and it seemed like
separating the two would only add to the already remote possibility of
interoperability  with other applications and that the potential for
ambiguous, redundant resources could also become an issue -- somewhere
down the road.

7) Another Kavetch:  RDF will sometimes use more than one colon in its
namespaces -- and i have seen an inconsistency across the examples
provided by both specs and inconsisties between the spec examples and
within the specs themselves with regard to when and under what
conditions might require its lack of compliance -- another peeve of mine
due to my hopeless dependence on the accuracy of such specifications.

8)RDF Schemas, at this time, do not contain a means of referencing
metadata registries other than manually using URIs/namespaces (uuid
won't help much with universal access)  It seems to me that this would
be one of those features that would be very worth while to take the time
to "build-in."
And also a requirement for a authoring schemas that can be useful.

> 3) I don't want readers of the XSchema spec to have to bounce between it, the
> RDF spec, and the XML spec to figure out what's going on.  Rigorous definitely
> made it into the goals, but readable and clear remains an important target.

9) RE:  The having to go back and forth between specs issue:

Boy!  You really hit the nail on the head with that one.

In order to deal with RDF in any kind of cohesive fashion --  which I
figured I'd better be doing if I were to be of any use to the schema
effort I found myself immersed in the following topics/resources:

*Dublin Core see digression...
<digression>(VERY helpful -- perhaps they have always had the right idea
in terms of starting simple and agreeing on some agreed upon meanings
and going from there step by step...however painful -- at least they
suceeded in clearly defining the semantic concepts, however
limited/simplistic -- when something is expressed its meaning clearly
understood, referencible, translatable, and,
eventually....extensible</digression>

*Warwick Meeting of something or other
*A lot of great metadata theory from that big conference in 96
*AI background stuff
*Content Algebra automata theory classics
*RDF syntax
*RDF schema
*PICS Labels (according to the charter RDF Schemas must be able to
express 	PICS labels -- what a pain that is (currently does not do so)
*MCF (the origin of many of RDF's ambiguities, in my opinion -- although
it 
	had some very progressive ideas, in its day)
*MCF Tutorial (immensely helpful)
*Aristotle's categories (not kidding -- thanks andrew)
*XML Data
*XML 1.0
*Daniel Dardiller's neato transformation paper from last fall
*Nicolaus Wirths stepwise refinement paper
*Mime spec, URI spec, HTTP spec (IETF)
**Semantic query ontology stuff (it is NOT a tangent dammit!)
**Object-oriented programming books (not just java, but
	C++ and component-based OO stuff in general -- another example of
	one of the KR worlds requiring integration into RDF's conceptual 
	model)
**Database theory and structure came into play somewhere in here
		(another "community")
**Distributed Computing Basics (mostly machine-level considerations --
but I can see some incredible semantic possibilities utilizing DC -- but
not if I can't parse my bogusly-doubly-coloned resource identifier, or
find my ambiguously-defined device identifier -- or my OS-specific uuid:
or yadda yadda...got I hope this all makes sense....or my
dynamically-generated style sheet and/or my desired datatype reference -
or SMIL streaming media source, or my architectural form....etc. etc.
there are ways to always name things in what I am starting to define as
an ANDROGENOUS manner -- not
homogenous, or even heterogenous,  but androgenous -- initially bland
but ultimately dependable...)

**Pattern theory and stuff written by that architect guy whose name
escapes 	me 
**Organic-based information systems -- (like SGML...)

And by then I realized I had come full circle.
And still didn't know what schema was :-)

10) There are accessibility issues with RDF if it is not expressed in
XML
   (and maybe even then...) that I don't think either RDF WG has been
able 	to begin to address. (see WAI Accessibility spec)

11) Let me try to say that another, perhaps more structured way :-)

At this point, RDF is half-baked ok?  And we're not even sure if we have
all of our ingredients yet, and all these different KR communities
already have their own cookbooks that work just fine for them that don't
seem to translate into more general kinds of syntactical expressions
without having to compromise on the semantic meaning that can be derived
from such expressions.  In a consistent syntax it would make sense that
this would happen, but when it happens in RDF, it does not do so in
anykind of a structured way.  It's harder to find holes in a wall that's
already opaque, yes?  Ultimately, this is a good thing, because defining
RDF at the molecular level will ultimately enable the creation of
complex derived semantic structures that will allow us to convey
bonified deep and rich and associative and contextual CONCEPTUAL MEANING
at an (ideally gracefully-degradable...) machine-readible, "smart"
seamless and automatic level.

Which would make sense except the problems that you run into do not
occur in a uniform way.  Not good for a language designed to be language
and application and implementation-independent with the goal of 
enabling the interoperability of data between domain-specific semantic
content models.

unified expressive  which really has some complex and intense semantic
models it has been asked to somehow integrate into a unified conceptual
language
capable of satisfying the needs of several KR communities (databased
programmers, dublin core, AI, etc...) without being to verbose (a goal
that I fear is often disregarded ;-)

The long and short of it is that, when I was in the RDF Schema WG, and
admittedly struggling with the conceptual model of a "schema" in general
-- I thought it was my own fault for not understanding schema, but once
I stopped trying to learn schema in RDF and wrote schema using, say, xml
data, or even a DTD...which IS a type of schema...the concept of
defining a data content model was not complex at all....which has led me
to believe that (DISCLAIMER:  IMHO...drumroll please...)

There's something casually ambiguious, inconsistent, and unquestionably
about RDF's structural model.  Why do so few understand RDF?  Because at
this point in the game, one can only conceive of the KINDS of things we
WILL be able to do with it -- not much can be done now until the syntax
is finished and it's conceptual (syntactical-not semantic mind you)
model
is completely defined.

How can we even begin to construct a semantic model without a consistent
and complete framework for the syntax?  We can't.   

At one point (i don't think i am divulging any top secret WG info here
;-)
guha offered to translate any existing schemas into rdf for us because
none of the other WG members could figure it out either -- and we're
talking after months of trying.  I think this is significant.

How come almost nobody understood it? How come so many of you stated
that you didn't understand it, when you are able to "grasp" so many
other complex and abstract ideas. Even tried and true dublin-core and
database and UML guys (as opposed to "dummies" like me :-)  weren't
"getting it"
 excepting maybe guha, andrew, and ora :-)
why is that?

Several reasons actually:

1) RDF (and particularly RDF Schemas -- where you use rdf to actually
attempt to "do something" besides simplistically describe the contents
of your documents for search engines (a la the META TAG or CDF -- and
hey, RDF is supposed to provide a core level semantic architecture that
sits right on top of our core XML layer -- enabling our domains
different semantic "knowledge representation" models to interact and
"learn" from each other, yes?  As it sounds now, one RDF implementation
has the potential to not interoperate properly with another -- even if
both "live" in the same domain and were written using the same syntax.  

Simon St.Laurent wrote:
> 
> How will XSchema use/relate to RDF?
> 
> There are several options I can see:
> 
> 1) XSchema could be designed, top-to-bottom, as an RDF application.

This is a really BAD idea for the reasons enumerated above

This
> wouldn't necessarily rule out defining XSchema in its own terms with an
> XSchema document or providing a DTD, but it would require that participants
> have significant working knowledge of RDF. On the bright side, I suspect the
> W3C would look more kindly on an RDF implementation, if we can make it work.
> 

I know everyone is very protective of RDF...and they should be, since it
is very much still gestating in the womb...(I too, look forward to
watching it grow up....:-)

> 2) XSchema could use RDF for the descriptive information contained in XSchema
> documents, but use its own model for defining elements and attributes.
> 

Using any part of RDF's ambiguity to define even part of XSchema isn't
really an option...not a useful one anyway

> 3) XSchema could ignore RDF altogether.  This wouldn't rule out the use of RDF
> to provide metadata information about documents, but would leave RDF out of
> the structural information included in XSchema documents.
> 

Gets my vote!  RDF should be able to implement whatever we decide on for
XSchema, in addition to its own RDF schema-specific functionality.

> 4) XSchema could allow the use of RDF as one of many ways to extend the schema> information provided.
> 

This is a given given 3, yes?  We don't want to restrict any form of
extension, do we?

> 
> Making RDF useful for this project, if we choose to use it, is going to
> require the creation of a much fuller set of tutorials in particular.  The
> current set (listed below) is notable for its density and its lack of concrete
> examples.  I'm willing to work on this project (I may even be able to get paid
> for some of it) if we decide that RDF is important.  I'd also like to hear
> about other resources already in existence.
> 

I've been working on an RDF Schema Tutorial that kinda got back burnered
when I left the group (webMethods had to bring in the big guns (Joe Lapp
;-)
and to be honest the time expenditure was starving me out -- plus I
couldn't write about W3C-based issues with a clear conscience being a
participating WG member (i'm a by the book gal :-) -- I may make an
exception to do some accessibility work....but I'm digressing again....

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)