Short Essay: Squeezing RDF into a Java Object Model
David Megginson
david at megginson.com
Mon May 3 17:21:49 BST 1999
The more I work with RDF, the more I find it fascinating in the
abstract but annoying in the concrete.
The biggest problem is that RDF claims an extremely simple data model
statement: subject, predicate, object
but that the model does not even come close to describing what
information actually appears in an RDF statement. Let's start with
the most naive mapping into a Java object model:
public interface RDFStatement
{
public abstract String getSubject ();
public abstract String getPredicate ();
public abstract String getObject ();
}
This will work fine for something like the following:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://www.purl.org/dc#">
<rdf:Description about="http://www.megginson.com/">
<dc:Title>Megginson Technologies</dc:Title>
</rdf:Description>
</rdf:RDF>
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Title"
statement.getObject() => "Megginson Technologies"
However, it falls apart quickly when the value of the property is a
resource:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://www.purl.org/dc#">
<rdf:Description about="http://www.megginson.com/">
<dc:Creator rdf:resource="http://home.sprynet.com/sprynet/dmeggins/"/>
</rdf:Description>
</rdf:RDF>
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Creator"
statement.getObject() => "http://home.sprynet.com/sprynet/dmeggins/"
In the first case, the object was a literal, and in the second case,
the object is a resource; however, the naive interface does not make
this information available. The only solution is to add a new
property to the Java interface:
public interface RDFStatement
{
public abstract String getSubject ();
public abstract String getPredicate ();
public abstract String getObject ();
public abstract boolean objectIsResource ();
}
Now, for the first example, we have
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Title"
statement.getObject() => "Megginson Technologies"
statement.objectIsResource() => false
and for the second example, we have
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Creator"
statement.getObject() => "http://home.sprynet.com/sprynet/dmeggins/"
statement.objectIsResource() => true
Unfortunately, we're not nearly through yet. The next nasty bit comes
from the aboutEachPrefix attribute. For example, here's a modified
version of the first example:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://www.purl.org/dc#">
<rdf:Description aboutEachPrefix="http://www.megginson.com/">
<dc:Title>Megginson Technologies</dc:Title>
</rdf:Description>
</rdf:RDF>
Now, this description no longer applies just to
http://www.megginson.com/, but to *all* resources whose URIs begin
with http://www.megginson.com/ (a constantly-changing set, and, in the
case of CGIs or Servlets, potentially infinite). As a result, the
following information is no longer sufficient:
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Title"
statement.getObject() => "Megginson Technologies"
statement.objectIsResource() => false
We need to modify the interface once again
public interface RDFStatement
{
public abstract String getSubject ();
public abstract String getPredicate ();
public abstract String getObject ();
public abstract boolean subjectIsPrefix ();
public abstract boolean objectIsResource ();
}
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Title"
statement.getObject() => "Megginson Technologies"
statement.subjectIsPrefix() => true
statement.objectIsResource() => false
But wait -- there's more. The RDF spec states that the 'xml:lang'
attribute does not modify the data model, but rather, is a property of
the (underspecified) literal. Consider the following (RDF purists
would perfer to use an RDF:Alt, but let's keep things simple):
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://www.purl.org/dc#">
<rdf:Description aboutEachPrefix="http://www.megginson.com/">
<dc:Subject xml:lang="en">markup</dc:Subject>
<dc:Subject xml:lang="fr">balisage</dc:Subject>
</rdf:Description>
</rdf:RDF>
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Subject"
statement.getObject() => "markup"
statement.subjectIsPrefix() => true
statement.objectIsResource() => false
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Subject"
statement.getObject() => "balisage"
statement.subjectIsPrefix() => true
statement.objectIsResource() => false
The language distinction is missing from our model, so we have to add
yet another property to the Java interface:
public interface RDFStatement
{
public abstract String getSubject ();
public abstract String getPredicate ();
public abstract String getObject ();
public abstract boolean subjectIsPrefix ();
public abstract boolean objectIsResource ();
public abstract String getObjectLang ();
}
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Subject"
statement.getObject() => "markup"
statement.subjectIsPrefix() => true
statement.objectIsResource() => false
statement.getObjectLang() => "en"
statement.getSubject() => "http://www.megginson.com/"
statement.getPredicate() => "http://www.purl.org/dc#Subject"
statement.getObject() => "balisage"
statement.subjectIsPrefix() => true
statement.objectIsResource() => false
statement.getObjectLang() => "fr"
We're still not done. Take a look at the following:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:megg="http://www.megginson.com/ns#">
<rdf:Description aboutEachPrefix="http://www.megginson.com/">
<megg:poem rdf:parseType="Literal">
<poem>
<line>Roses are red,</line>
<line>Violets are blue</line>
<line>Sugar is sweet,</line>
<line>And I love you.</line>
</poem>
</megg:poem>
</rdf:Description>
</rdf:RDF>
Since the <megg:poem> element sets the 'rdf:parseType' attribute to
"Literal", the contents of the element will not be interpreted as RDF
markup. As a result, the value of this statement is a literal string:
statement.getObject() => "
<poem>
<line>Roses are red,</line>
<line>Violets are blue</line>
<line>Sugar is sweet,</line>
<line>And I love you.</line>
</poem>
"
statement.objectIsLiteral() => true
If I were to round-trip this back to XML, however, how would I know
that it was meant to be XML markup? My software might just as easily
generate the following:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:megg="http://www.megginson.com/ns#">
<rdf:Description aboutEachPrefix="http://www.megginson.com/">
<megg:poem rdf:parseType="Literal">
<poem>
<line>Roses are red,</line>
<line>Violets are blue</line>
<line>Sugar is sweet,</line>
<line>And I love you.</line>
</poem>
</megg:poem>
</rdf:Description>
</rdf:RDF>
This probably isn't what I want. As a result, I have to add more
information to my Java interface to note whether the literal value is
meant to be read as XML markup:
public interface RDFStatement
{
public abstract String getSubject ();
public abstract String getPredicate ();
public abstract String getObject ();
public abstract boolean subjectIsPrefix ();
public abstract boolean objectIsResource ();
public abstract boolean objectIsXML ();
public abstract String getObjectLang ();
}
At this point, it might make sense to split this out into different
classes:
public interface RDFComponent
{
public abstract String getValue ();
}
public interface RDFSubject extends RDFComponent
{
public abstract boolean isPrefix ();
}
public interface RDFPredicate extends RDFComponent
{
}
public interface RDFObject extends RDFComponent
{
public abstract boolean isResource ();
public abstract boolean isXML ();
}
public interface RDFStatement
{
public abstract RDFSubject getSubject ();
public abstract RDFPredicate getPredicate ();
public abstract RDFObject getObject ();
}
Obviously, there's a much more complex model underlying RDF than the
spec lets on, and that model affects not only the ease or difficulty
of implementing an object model, but also the difficult of many
standard operations like queries against a collection of RDF
statements and storage in a relational database.
I'd love to hear from others on this list who've worked with RDF.
It's full of some very good ideas, but I'm afraid that the underlying
(and hidden) conceptual complexity might stunt any serious
implementation.
All the best,
David
--
David Megginson david at megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list