XSL and the semantic web

Mon Jun 21 06:34:42 BST 1999

Marcelo Cantos wrote:
> 
> Then the following two transformations:
> 
>   <employee status="active">
>     <name>Joe</name>
>     <phone>555-12345</phone>
>   </employee>
> 
>   <H3>Joe</H3>
>   <P>Phone: 555-12345</P>
> 
> Are of a fundamentally different character.  It is not a simply case
> of having more or less information.  In the second example, even the
> structure of the information you are entitled to (and this from the
> owner's viewpoint) has been lost, and gratuitously so.

Gratuitiously in what sense? You need to format the thing, right?
Therefore you need to map to formatting constructs.

> FO's leave you with what might as well be a GIF rendition of the
> information you are after 

That's a serious exaggeration. Can you text-index a GIF? Can you do a
"find word" in a GIF? Can you convert a GIF to RTF, load it into word for
Windows and start typing?

David is completely right that these things live on a spectrum. GIFs are
far, far down the end of the spectrum beyond FOs.

> A qualified FO:
> 
>   <DIV CLASS="employee" status="active">
>     <H3 CLASS="name">Joe</H3>
>     <P>Phone: <SPAN CLASS="phone">555-12345</SPAN></P>
>   </DIV>
> 
> would certainly go some way towards easing the strain though I don't
> know if typical FO models (XSL in particular) allow this much
> flexibility.

I think that there is an important but subtle point that keeps getting
lost. The term "employee" is absolutely useless unless I know know about
it *in advance*. Unless I am expecting to get thousands of documents about
"employees" I can't set up the stylesheets, queries, etc. to make this
information useful.

An "H3" is more useful to a browser than an "EMPLOYEE" because the former
is *known in advance*. In all of this hand waving about the semantic web,
people seem to think that once you put the semantics out everything just
falls into place. Getting the semantics out is the EASY PART.
Rationalizing them is the hard part.

If Lexis-Nexis publishes its terabytes of data in a proprietary document
type, it might as well be Greek. HTML is more useful because I can at
least display it. Guessing at the structure of a document type from
element type names is as dangerous as guessing based on text content like
colons and font sizes. If you want the semantic web to be robust, you need
people to WANT to publish semantic data in *standardized document types*.
Even if we could force them to publish in semantic but non-standard
document types we would be no farther ahead!

Trees: XSL being used to destroy semantic information.

Forest: The hard work of building robust information systems that will
even *allow* us to share semantics meaningfully.

> I have to take issue, however, with the characterisation of the
> transformations as points on a spectrum.  There is a very well defined
> distinction between transformation and formatting within the XSL
> model, hence the move to split it into two separate standards.

Actually, the two processes in XSL would be better termed "transformation"
and "layout." Both steps do *formatting*. Choosing which text becomes the
footer text is certainly formatting but it is done by the transformation
part of the language.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

[Woody Allen on Hollywood in "Annie Hall"]
Annie: "It's so clean down here."
Woody: "That's because they don't throw their garbage away. They make 
        it into television shows."

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)