Round 2: How an XML instance document references an XML Schema

Henry S. Thompson ht at cogsci.ed.ac.uk
Thu Jan 6 13:49:08 GMT 2000


John Aldridge <john.aldridge at informatix.co.uk> writes:

> At 23:18 05/01/00 +0000, ht at cogsci.ed.ac.uk (Henry S. Thompson) wrote:
> >John Aldridge <john.aldridge at informatix.co.uk> writes:
> 
> >> I'd hoped to find a statement such as "a general-purpose schema-aware
> >> processor must provide some catalogue facility which allows the
> >> specification of a location from which to fetch the schema corresponding to
> >> an NS URI.  Only in the absence of such a catalogue entry may the processor
> >> attempt to dereference the URI given by the schemaLocation attribute".
> >
> >As I've tried to convey in other messages in this and related threads, 
> >the XML Schema design is VERY concerned with precisely the issue you
> >raise above, namely, schema validation should not be a hostage to
> >connectivity and/or URL stability.  Our approach was, however, NOT to
> >design YACM (Yet Another Catalog Mechanism), but allow for ANY
> >alternative schema location mechanism which people come up with.  I
> >hope a careful reading of chapter 4 of the PWD [1] will clarify this
> >for you.
> 
> I did carefully read Chapter 4, honest, but still struggled to understand
> the way the flexibility it includes should be used.  Note that I did not
> suggest above that the document should include a specific catalogue design;
> just that I'd hoped it would mandate the existence of _some_ catalogue.
> 
> >For myself, I envisage schema validators working the in a similar way
> >to XT, James Clark's XSLT implementation: you will be able to invoke a
> >schema validator with explicit specification of the schema(s) you wish
> >applied,
> 
> By which you mean (I think) "explicit specification of _how to locate_ the
> schema(s) you wish applied,".  Presumably you are not intended to be able
> to request that elements be validated against a schema with a
> targetNamespace which does not match the namespace from which the elements
> to be validated are drawn?

Both points correct:  how to _locate_, and targetNamespaces must
always match (except in the case where there is none, but that's
another can of worms).

> >         or you can leave it to the validator (Not an option XT
> >provides).  The XML Schema PWD allows for one, the other, or both, but
> >observes that only the schemaLocation approach gives interoperability
> >(at the price of fragility).
> 
> OK, that's very helpful.  So, when writing an XML file, I should start it:
> 
> <?xml version="1.0">
> <stuff
>    xmlns="http://www.informatix.co.uk/Stuff"
>    xmlns:xsi="http://www.w3.org/1999/XMLSchema/instance"
>    xsi:schemaLocation="http://www.informatix.co.uk/Stuff
>       http://www.informatix.co.uk/Stuff/Stuff.xsd"
> >
> :
> </data>
> 
> And then say to the customers for this data: 
> 
>    You must process this data either
> 
>    (a) in an environment with reliable access to
>        http://www.informatix.co.uk/Stuff/Stuff.xsd (in which case you
>        may use any "general-purpose schema-aware" XML processor), or,
> 
>    (b) you are constrained to use only those XML processors which
>        allow you to specify that the schema for the namespace
>        http://www.informatix.co.uk/Stuff is to be found in some other
>        location accessible to you.

Yes.

> In the context of the obligation "...unless directed otherwise
> general-purpose schema-aware processors must attempt to dereference each
> schema URI...", the existance of a catalogue or other mechanism for
> locating a schema counts as "directed otherwise".

Well, not the existence alone, but the existence plus some indication, 
from user or application choice, to use what exists.

> I guess I'm just suspicious that, in the absence of specific requirements,
> processors will not bother to implement an such alternative mechanism.
> After all, the language quoted in the previous paragraph is very similar to
> that describing DTD links:  "An XML processor ... may use the public
> identifier to try to generate an alternative URI.  If the processor is
> unable to do so, it must use the URI specified in the system literal".

You can't make people provide interoperable solutions, only encourage
them to do so, you're right.

> . . .
>
> I guess I was really confused about the relation between schemas and
> namespaces.
> 
> I understand your answer to mean that by using a name from a namespace, and
> then using a schema-aware processor, you are automatically claiming that
> the element conforms to the schema for that namespace.
> 
> There is no such thing, to a schema-aware processor, as a namespace without
> an associated schema.

That's close, but there are undoubtedly some grey areas.  In the
simplest case: a schema-validator is validating the content of some
element with a schema for its namespace and encounters an element name
from a different namespace.   What happens?  If neither schemaLocation 
nor built-in information nor namespace-URI-based search yield a
schema, there is a problem.  Let's look a little harder at how this
could happen.

1) The instance looks like this

  <a:root xmlns:a='uri:a' xmlns:b='uri:b'>
   <a:a ...>...</a:a>
   <b:b ...>...</b:b>
  </a:root>

  The content model the validator is working with, within a schema for
  the uri:a namespace, looks like this:

  <element ref='a' . . ./>
  <element ref='o:b'/> 

  Now this latter reference is not allowed unless there's an <import>
  statement for it.  But that <import> may not contain a
  'schemaLocation' attribute, or the URI specified there may not be
  accessible, etc.  At that point an error should be raised.

2) The instance is the same, but the relevant content model looks like 
   this:

  <element ref='a' . . ./>
  <any namespace='##other'/>

  This, and related cases, are the grey area mentioned above.  The WG
  has not yet decided exactly what the detailed schema-validation story
  is wrt validation within material which in the first instance is
  allowed by a wildcard particle in a content model.

> Thanks for your help, both here and on other topics to which I've not
> contributed but have followed with interest.

You're welcome:  you, and the rest of xml-dev, are our launch
customers. . . :-)

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht at cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list