<html>

<title>Using XSL for Structural Validation

</title>

<body>

<h1>Using XSL as a Validation Language

</h1>

<p><a href="mailto:ricko@gate.sinica.edu.tw">Rick Jelliffe</a><br />

Academia Sinica<br />

Taipei, Taiwan</p>

<p>1999-01-24</p>

<h2>Abstract</h2>

XSL can be used as a validation language.

An XSL stylesheet can be used as a validation specification. 

Because XSL uses a tree-pattern-matching approach, 

it validates documents against fundamentally 

different criteria than the content model.

This paper gives some examples.

</p>

<p>

XSL can be used on structured documents which

do not use markup declarations. And XSL

used in consort with XML markup declarations

seems a very nice and straight forward approach:

two small languages, each good at different

things.

 </p>

<p>What is missing? The current XSL does not have some features

which would be desirable (how to report the current

line and entity, in particular) for a user-friendly

system. Regular expression pattern matching on

strings would be very useful.

(The main thing missing from this note is a

definite way to create the message "<i>This file is

valid</i>"; validity is shown by an empty list of

validity errors.)

</p>

<h2>Definitions</h2>

<p>A <em>validator</em> is software which examines

a structured document (e.g., an XML or HTML document,

a WebCGM document) and reports on the conformance

of that document's structures against some patterns.

</p>

<p>A <em>validation specification</em> are these

patterns expressed in some formal way, in particular

for use by a program.

In object-oriented software engineering terms (refer B. Meyer),

a validation specification give the pre- and post-conditions

we want to assert about a structured document's structures;

it is useful to make such assertions, because it

clarifies a programmers tasks and the capabilities

and nature of the data.  It also can have a valuable  role

 in contractual conformance.

In markup terms (refer TEI),

a validation specification (such as a DTD) gives a theory about a 

document's structure.

</p>

<p>A validator can be specified with a general purpose

language, or a specific validation language.

A validation language therefore embodies a theory about

which kinds of patterns are common, useful, important,

interesting, expected by users, 

easy to implement, or which have patterns that can not be

validated readily by other validators or validation 

languages.

Theories about which patterns are common, useful, etc.

are in turn judgements based on particular technologies

and usage domains.                        

</p>

<p>Just as with programming languages, the syntax

and operation of a validation languages are contraversial.

So a validation language also embodies a theory about

which syntactic and paradigmatic features are

common, useful, important,

interesting, expected by users, 

easy to implement, or which are not available

in other validation languages.

</p>

<p>A <em>schema</em> is a collection of

rules about a document's structures. A schema definition

language is not a validation language, but may contain

a validation language. A schema definition

language may also allow any of the following:

<ul>

<li>information about data storage, encoding, transmission and

notation;

</li>

<li>human readable documentation;

</li>

<li>information to allow the automatic construction of

input front-ends;

</li>

<li>information about the meaning of elements, and various

linkages to other schemas.

</li>

</ul>

</p>     

<p>An important distinction between a schema language and 

a validation language is that a schema language will specify,

for example,

"<i>this element is a date</i>", while a validation language

will concentrate on more lexical/structural issues: 

"<i>this element should conform to the regular expression</i>

<tt>/nnnn-nn-nn/</tt>". 

</p>                 

<p>Examples of validation languages are:

<ul>

<li>W3C <em>XML markup declarations</em>;</li>

<li>ISO <em>SGML markup declarations</em>, which are a superset of XML markup

declarations;</li>

<li>ISO Architectural Forms, which allow a document to be

validated against multiple parallel content models, keyed not

only against element type names, but also against attribute values;</li>

<li>ISO Lexical Type Definitions, which allow element or attribute

values to be validated against a POSIX regular expression;</li> 

<li>DDML (formerly XSchema), a subset of the XML markup declarations

expressed using XML instance syntax;</li>

</ul>

</p>

<h2>Limitations of Markup Declarations</h2>

<p>The XML markup declarations 

(in particular, the content models)

have many desirable properties

as a validation language:

<ul>

<li>terse;

</li>

<li>declarative;

</li>

<li>simple, and modest in its aims;</li>

<li>fragment-friendly, since the interpretation of content models

does not depend on the document context;

</li>

<li>familiar, since their operation is familiar to people 

exposed to BNF or formal grammers;

</li>

<li>standard, through the ISO heritage;</li>

<li>widely implemented;</li>

<li>understood--the nature and deficiencies of

content models have been well explored for more

than a decade on many projects.</li>

</ul>

</p>

<p>However, there are situations which the markup

declarations do not address, and some other system

would be useful:

<ul>

<li>the markup declarations are not available 

as structured documents in their own right

(in the absense of nodes in DOM to do this);

</li>

<li>this in turn prevents hypertext linking,

structured annotations, and extending the

validation language to become a full schema

definition language;

</li>

<li>various kinds of partial validation,

where only targetted structures are checked;

</li>

<li>extended validation, where more than the

immediate context is checked--for example

to check that 

<ul><li>

if a certain attribute is specified with 

a particular value, some other attribute

has also been specified; or</li>

<li>that if a certain element type should not

be used if its parent's parent is some 

other element type (e.g., to exclude 

an RDF:RDF element from any subelement of

an RDF:RDF element). 

</li>

</ul>

</li>

</ul> 

</p>

The XML markup declarations are part of XML.

In my view, there is scope for the development of

a validation language which complements XML

markup declarations rather than reinventing them.

(No disrespect, criticism or lack of enthusiasm for any 

schema definition language or validation language

is intended by this comment.)

</p>

<h2>XSL Match Patterns</h2>

Such a language already exists: XSL.

XSL match-patterns represent a very different view of

a document's structure than XML content models.

XSL match-patterns therefore can be used to complement

and enhance XML content models, as well as any

other content-model-based validation language.

</p>

<p>Doing this enables us to see validation as

merely another kind of document transformation. 

In this case,

the input document is transformed into a document 

which marks up structures in the original which are

not valid.

</p>

<p>(Note, a kind of validation can also be provided

by treating validation as a kind of formatting:

for example, a CSS stylesheet could be provided

which highlights in red any element which 

is not valid. The CSS pattern-matching rules 

may be complex enough to create a useful validator

based on this idea in some circumstances.)

</p>

<p>This use of a transformation language for validation

is hardly novel. Indeed, one reason why SGML

system constructed on top of transformation languages

(e.g. OmniMark, Perl) have a good rate of success

is that system developers can (and do) build extended

validation systems readily. Such validators help the

programmers discover structural

patterns: useful or pathological. 

They can also allow looser

and simpler content models in the markup declarations,

resulting in better layering of validation.

</p> 

<p>The advantage of using XSL as a validation language

are

<ul>

<li>terse--the match patterns are very terse, like 

XML content models;

</li>

<li>declarative;

</li>

<li>simple, and modest in its aims;</li>

<li>fragment-friendly, since the interpretation of content models

does not depend on the document context;

</li>

<li>familiar, since their operation 

will be familiar to people using XSL for

transformation or formatting purposes;

</li>                      

<li>widely implemented--James Clark and IBM already have 

XSL tools available;</li>

<li>understood--the nature and deficiencies of

tree-based patterns have been well explored for more

than a decade on many projects in languages such as

OmniMark.</li>

</ul>

</p>           

<h2>Template for the Validator</h2>

<p>Following is a stub which can be used to 

construct a validator.

<p>

<pre> 

&lt;?xml version="1.0"?&gt;

&lt;!-- Template for XSL Validator --&gt;

&lt;xsl:stylesheet 

    xmlns:xsl="http://www.w3.org/TR/WD-xsl" 

    xmlns="http://www.w3.org/TR/REC-html40" 

    result-ns=""

    xmlns:rdf="http://w3.org/TR/1999/PR-rdf-syntax-19990105#"

&gt;<font color="red">&lt;!-- add any other namespace declarations above --&gt;</font>

  &lt;!-- Root template - start processing here --&gt;

  &lt;xsl:template match="/"&gt;

    &lt;HTML&gt;

      &lt;HEAD&gt;

        &lt;META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/&gt;

        &lt;META http-equiv="Expires" content="0"/&gt;

        &lt;TITLE&gt;Results of Validation (using XSL)&lt;/TITLE&gt;

      &lt;/HEAD&gt;

      &lt;BODY&gt;

        &lt;H1&gt;Results of Validation (using XSL)&lt;/H1&gt;

        &lt;UL&gt;

          &lt;xsl:apply-templates/&gt;    

        &lt;/UL&gt;

      &lt;/BODY&gt;

    &lt;/HTML&gt;

  &lt;/xsl:template&gt;                        

  &lt;xsl:macro name="element_warning_message" &gt;

    The invalid element is found at tree location &lt;xsl:number  level="multi" count="*" format="1." /&gt;

    &lt;xsl:if test='.[@ID]'&gt;

    The element's ID is &lt;xsl:value-of select="@ID" /&gt;.

    &lt;/xsl:if&gt;

    &lt;xsl:if test='..[@ID]'&gt; The element's parent's ID is &lt;xsl:value-of select="../@ID" /&gt;.

    &lt;/xsl:if&gt;

  &lt;/xsl:macro &gt; 

    &lt;xsl:macro name="attribute_warning_message" &gt;

    The element with the invalid attribute is found at tree location &lt;xsl:number  level="multi" count="*" format="1." /&gt;

    &lt;xsl:if test='.[@ID]'&gt;

    The element's ID is &lt;xsl:value-of select="@ID" /&gt;.

    &lt;/xsl:if&gt;

    &lt;xsl:if test='..[@ID]'&gt; The element's parent's ID is &lt;xsl:value-of select="../@ID" /&gt;.

    &lt;/xsl:if&gt;

  &lt;/xsl:macro &gt; 

<font color="red">  &lt;!-- Good patterns. Put your instructions here. --&gt;</font>

<font color="red">  &lt;!-- Bad patterns. Put your instructions here. --&gt;</font>

  &lt;!-- Do not change after here. This handles defaulting. --&gt;  

  &lt;xsl:template match="text()" priority=-1"&gt;

    &lt;!-- strip characters --&gt;

  &lt;/xsl:template&gt;

&lt;/xsl:stylsheet&gt;

</pre>           

<p>Accept good patterns using the following template:

<pre>   

   &lt;xsl:template match="<font color="red">pattern</font>" priority="2" &gt;

    &lt;xsl:apply-templates/&gt; 

   &lt;/xsl:template&gt;

</pre>

<p>Validate against bad patterns using the following template:

<pre>   

   &lt;xsl:template match="<font color="red">pattern</font>"&gt;

    &lt;LI&gt;

        <font color="red">&lt;!--put message here--&gt;</font>

        &lt;xsl:invoke macro="<font color="red">node</font>_warning_message" /&gt;

    &lt;/LI&gt;

    &lt;xsl:apply-templates/&gt; 

   &lt;/xsl:template&gt;

</pre>               

<p>You can use these in two ways.</p>

<p>The positive way is to make "good patterns"

which cover every context in which your element type (if that is what you

are validating) is allowed to appear. Then you put a simple case which

catches simple occurrances of the element as the "bad pattern".

</p>

<p>The negative way is to make "bad patterns" which find element

types in contexts you specifically want to deem invalid. The

"good pattern" can contain any excepts to this. You can use the

"good patterns" to create a stop list of specific cases which break

a more general rule about "bad patterns". Use the priority

attribute to show that the "good patterns" should be tested before the

"bad patterns".

<h2>Examples</h2>

These examples were developed with the LotusXSL beta.

There may be slightly different syntaxes required for the

other XSL betas (i.e., James Clarks' and Microsoft's).

The examples each validate something which an

XML markup declation cannot directly specify.</p>

<h3>1: Unwanted Element</h3>

<p>This example imposes additional requirements compared

to the HTML DTD. It acts a little like an SGML global 

exclusion, in that the content model of the markup declarations

may allow the blink element,

but this validation layer exposes the invalidity.

</p>

<pre>

   <font color="red">&lt;!-- Put this in the "bad patterns" section in the template --&gt;</font>

   &lt;xsl:template match="BLINK"&gt;

    &lt;LI&gt;        

     Element "BLINK" has been used. This is against our house style.

    &lt;xsl:invoke macro="element_warning_message" /&gt;

    &lt;/LI&gt;

    &lt;xsl:apply-templates/&gt; 

   &lt;/xsl:template&gt;

</pre>               

<p>If a BLINK is found, a warning is generated. The location in the tree

is given. The ID attribute of the element (if any exists) is given.

</p>

<h3>2: Element Context</h3>

<p>This example checks that an rdf:RDF element

never appears as a descendent of another rdf:RDF

element.

</p>

<pre>

   <font color="red">&lt;!-- Put this in the "bad patterns" section in the template --&gt;</font>

   &lt;xsl:template match="rdf:RDF[ancestor(rdf:RDF)]"&gt;

    &lt;LI&gt;        

    The element "rdf:RDF" has been found inside another element "rdf:RDF".

    &lt;xsl:invoke macro="element_warning_message" /&gt;

    &lt;/LI&gt;

    &lt;xsl:apply-templates/&gt; 

   &lt;/xsl:template&gt;

</pre>

<h3>3: Attribute Context</h3>

<p>This example checks that an

"other-unit" attribute can only be 

specified if the value of the "unit"

attribute is "other".

</p>

<pre>     

<font color="red">&lt;!-- Put this in the "Bad patterns" section of your template --&gt;</font>

&lt;xsl:template match='fig[(@unit="other") and (@other-unit="")]' priority="2" &gt;

                &lt;LI&gt;

                The element "fig" has attribute "unit" specified as "other".

                But the attribute "other-unit" has a zero length.

                &lt;xsl:invoke macro="attribute_warning_message" /&gt;

                &lt;/LI&gt;

    &lt;xsl:apply-templates /&gt;

&lt;/xsl:template&gt;

&lt;xsl:template match='fig[(@unit="other") and (not(@other-unit))]'&gt;

                &lt;LI&gt;

                The element "fig" has attribute "unit" specified as "other".

                But the attribute "other-unit" has not been specified.

                &lt;xsl:invoke macro="attribute_warning_message" /&gt;

                &lt;/LI&gt;

                &lt;xsl:apply-templates/&gt;              

&lt;/xsl:template&gt;

</pre>

Checking attributes requires answering two questions.

First, has the attribute specified in the document? 

Second, even if it is specified, does it have a zero-length

value?

</p>

<hr />

<p>Copyright (C) 1999 Rick Jelliffe.

Please feel free to publish this in any way you like,

but try to update it to the most recent version,

and keep my name on it.

</p>

</body>

</html>