XSchema Spec, Sections 2.0 and 2.1 (Draft 1)

John Cowan cowan at locke.ccil.org
Mon Jun 8 23:26:36 BST 1998


Paul Prescod wrote:

> If the entity is NOT resloved by the time that the XSchema processor sees
> it, then I know of no way technically for the XSchema processor to tell
> the XML parser to re-parse the document with a particular value for the
> entity. SAX does not provide such a communication path. None of the other
> parsers I have used do either. I hope they never do.

The XML-specific part of the DOM, however, does allow you to determine
what parts of the document were incorporated by reference to what
entities.

This is not the same as *changing* the entity replacement after the
fact, to be sure.

> So if XSchema processors are to be able to direct parsers to replace
> entities, we will have to rewrite all of the parsers to allow that! We
> will also have to make common the nasty habit of not processing entities
> immediately.

Correct.

However, the appearance of entity declarations (not
necessarily definitions) in the XSchema allows an XSchema validator
to determine what particular entities were used, and whether they
were replaced in accordance with the XSchema or not.

The fact that the SAX parser interface doesn't provide the necessary
information is neither here nor there.  SAX intentionally does not
provide full information about XML document instances.
 
> cpy.xsc
> <ENTITY NAME="COPYRIGHT">...</ENTITY>
> 
> Now I've got a document that depends on it:
> 
> <?XSC schema="cpy.xsc">
> <FOO>
> &copyright;
> </FOO>
> 
> That document is simply not well-formed in XML terms. The XSchema has no
> opportunity to intervene, either technically via SAX, or from a linguistic
> point of view according to the XML spec. The document is not XML.

Nobody denies this, certainly not I.
 
> The only way I know to get around it is to compile cpy.xsc to cpy.dtd and
> change the processing instruction to a DOCTYPE. But then you've thrown
> away extensibility and usability.

I don't think this is the *only* thing to do with XSchemas, but I
do think that it is *one* way to use them.  Equivalently, one could
compile an XSchema and a document instance purporting to conform to
it into an XML document instance with only an internal DTD subset, and
validate the result.  The resulting document instance would be considered
disposable, of course.  
 
> I don't see how XSchema will be any different if it requires close
> communication between the parsers and the XSchema processor on entity
> expansion.

It wouldn't.

> I can think of a few ways to handle entity sets like HTML's. One would be
> to use separate declarations. Another would be to put both the entity
> declaration and the schema declaration in an external entity that you
> include in your doctype declaration.
> 
> My primary philisophic complaint is that I see schemas not as providing
> resources or definition to a document, but as describing a class of
> documents. I might accept that a schema could say that &copy; is allowed
> in documents of type HTML, but as soon as you turn that around and say
> "&copy; is available to HTML documents and it means &U...;" I get
> uncomfortable because we are back in the situation of having the language
> do two things and back into our round-trip communication situation.

As I keep saying....

XSchema processors should not expand entity references themselves,
but allow that to be done by the XML processor.

XSchema processors that know what entity references should exist in
a document instance can validate that those references do exist, and
that they refer to the proper thing.

One approach, given that general references must be synchronous, would
be to give the internal-entity declaration a content model of ANY,
and let its content be the replacement itself, rather than #PCDATA
which represents that replacement.  Then validation would consist of
checking that expanded entity references in the DOM of the document
are EQUAL (in a Lisp-y sense) to entity definitions in the DOM
of the XSchema.

> Resources (namespaces, characters, entities, etc.) are one thing.
> Constraints are another. I'd like to keep them separate.

And I would like to keep them together.  A matter of taste.
 
> I would have
> expected each XSchema to be applied individually as if the others did not
> exist. It's like: "I've got this document..I'll test if it conforms to
> HTML 2.0. It does! I'll test against HTML 3.0. It does! I'll test against
> HTML 4.0. It doesn't." In other words, schema testing should be
> "side-effect-free."

That is what I meant.

> If we go with this model, then the entities required would have to be
> declared in all schemas.

Only if validation requires that every entity in the document instance
appear in the XSchema.  I don't think that's a requirement, given that
XSchema processors aren't responsible for entity reference expansion.
IOW, a document may refer to entities that aren't defined in any
XSchema (necessarily they are declared in the document's DTD, though).

> (I don't like that because it requires
> communication between the people who design the schemas. So if I want to
> check a document (part-wise) against both MATHML and HTML, I would somehow
> have to make sure that the two schemas had the same entities.

More weakly, that they do not define the same entity in conflicting
ways.

> My point was that people misunderstand DTDs and there are many, many
> subtle problems caused by the fact that documents and DTDs are so closely
> tied. I don't want to repeat that in XSchema. I was basically concerned
> about people trying to have DOCTYPE's pointing to XSchemas.


> > #5: Agreed.  The purpose of including an entity in an XSchema is
> > *not* to enable macro-expansion.
> 
> I don't understand this. What else do (text) entities do?

Entities don't "do" anything.  One thing to do with references to
entities is to expand them to their referents.  Another thing to do
is to find out whether the content of the entity is what you expect
it to be.

Here's an example.  Let's say that some XSchema named "foo"
declares that the "Auml" internal entity has the value "&#xC4".
A document purporting to conform to this XSchema might define "Auml"
as "&#xC5", either in its internal subset or in its external subset.
Such a document would *fail* validation against "foo".  Without
entity declarations in "foo", that test would not be possible.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list