Namespaces and XML validation

Sat Aug 8 13:31:38 BST 1998

Thanks Charles,
	I found this a useful and optimistic review. I'd like to take it as a
starting point for a discussion about where we might go next. I'll add some
comments of my own about 'XML validation' to try to tie in some historical
perspective.

	[I think the discussion on XML-DEV in the last 6 days has been extremely
valuable. There are clearly people to whom the new proposal has come as a
shock and it will take time to readjust. In all our discussions we must be
extremely aware that a lot of personal efforts and ideas have gone into XML
and that we should avoid upsetting people (usually unintentionally). It is
clear that XML-DEVers are making a real effort to be constructive, even
those who wished for different approaches. Keep it up!]

At 23:46 07/08/98 -0700, Charles Frankston wrote:
[...]
>
>John --
>
>I think you and some others on this list are being insufficiently
>imaginative in tackling the problem of figuring out how to validate
>documents that use the namespace spec.  I believe it is possible to do
>fragment validation against DTDs, if that is so desired.  However this

I agree with this belief (although I haven't worked it out in detail). I
think there is a critical mass of people who feel the same way (and I
actually think that JohnC is among them!). 

>certainly cannot be accomplished without some modification of the mechanics
>of DTD validation, as Tim outlined.  I.e. pre-processing the DTD and the
>instance to convert all names to URI, name pairs, and what some have called
>fragment validation using as the root of each fragment any element that
>introduces a tag from a new namespace.  A new namespace for this purpose is
>defined as one from a different namespace than its parent.
>
>I do not believe the new namespace proposal with local scoping makes it any
>harder to do than the old PI based namespace proposal.  It may make it

Again, my assessment of the discussion is that no one has adopted the
slogan "PI workable: attributes unworkable". A  few people clearly feel
namespaces as such do not (yet) provide their solution and a greater number
are concerned that there are hidden dragons which haven't been confronted.

One such dragon is the concern about retrofitting namespaces to: XPointer,
XLL, XSL, DOM, etc. I take it as axiomatic - and I'd suggest XML-DEVers
also do - that those responsible for these activities are actively working
on how to make them namespace-compliant. This may, of course, throw up
technical problems but I don't think we need to worry about convincing
people of its necessity.

This raises the question of what is XML 1.0 , 1.1, etc. It is absolute that
namespaces have not broken XML1.0. XML parsers (including SAX-aware ones)
are not broken by any decisions on namespaces. Those who had a
non-namespace XML solution/product will find it still works. Those who used
colonised names with their own application semantics will find they still
work. This will include those who had colonised DTDs used for validation
and other things.

Note that we do not - at present - have any final REC that specifies how
any XML element or attribute should be interpreted (other than XML1.0's
xml:space and lang, and the WD for XLL - perhaps there are others in XSL.)
This means that if an application receives a namespace-aware document it
can simply ignore the attributes it doesn't understand. The problem of how
it becomes namespace-aware is what we are addressing at present, but we
should remember that we are only just starting - 19980802 has not broken
anything of permanence.

>harder to think about it.  The fact that the namespace prefix may actually
>be declared physically in the document after the DOCTYPE doesn't matter.  At
>the time when the instance is to be compared to see if it matches the
>declaration in the DTD, the prefix to URI mapping is available.  Can you do
>this without modifying your validation code?  Certainly not.

This is an important point - I have always assumed that the final
implementations of the full power of namespaces will require substantial
software development. Most of us have also assumed it will have to involve
the conventional DTD at some stage. The DTD might be preprocessed, or
perhaps multiple DTDs might be required for a single document instance.
>
>However, the approach I've outlined above still has a severe problem, which
>I don't think is so easily solved.  In order to do this form of validaton,
>what gets put in the DTD is the prefix, and not the URI.  That is a fatal
>flaw, because it makes it impossible to re-use a DTD for more than one
>document unless all documents that use that DTD use the same prefix for the
>same URI.  That elevates the prefix to the same status as the URI --
>something one must take care to keep globally unique.  The prefix is not
>syntactically suited to this task.

Agreed.

>
>For that reason, and because of other well known deficiencies in DTDs, I
>think the issue of validation and namespaces is better dealt with in the
>context of a whole new schema language.  The XML-Data submission clearly
>showed how this could work.  The new namespace proposal does no violence to
>XML-Data's use of namespaces.  I would therefore rather spend my time
>working on the new schema language, as the XML WG will shortly be doing,
>than patching DTDs.  
>
>(The XSchema work that's been going on in XML-Dev is hopefully equally
>adaptable -- the volume of mail is simply too high so I have not been
>following it closely.  The key is the use of XML syntax, which enables one
>to use namespace declarations in the schema in a manner that mimics the
>instance.)

I think this is the key point. The semantics of dealing with namespaces
cannot be managed by conventional DTDs. We shall certainly need schemas. It
may be possible to manage these 'below the surface' (e.g. convert a public
DTD to a schema, transform this to be namespace aware and then retransform
to a DTD for syntactic validation.). 

It could have been useful to manage prefixes by a PE like:
<!ELEMENT %foo;:CHAPTER (%bar;:SECTION | %foo;:SECT)*>
but this isn't legal. However, if we transform to a schema we can use
(general) entities to do this (e.g. &foo;:CHAPTER) and we can retransform.
(As someone whose discipline is based on Fourier transformation I find this
a natural approach). So we shall need software, but I don't think it's
horrendous.
>
I'll now give my perception of possible approaches based on my (rather
inadequate) knowledge of SGML. I tackled this namespace problem 3 years ago
when starting to develop CML and I found I need namespaces (I used
constructs like CML.MOL  - I even had a language called XML, e.g.
XML.ARRAY). I asked on comp.text.sgml how to do this and got the following
responses:
	- there is no mechanism for combining DTDs
	- use SUBDOC (essentially compartmentalised information components in a
document)
	- use HyTime (or some other sort of architectural forms)

The problem was that SUBDOC was not supported by any free software (I was
quoted a special introductory price of $10K for a system that would do
SUBDOC). HyTime was also unavailable for free (and it was also effectively
impossible to find out anything about it.) So my conclusion was that in
1996 (XML 0.0):

	 multiple SGML DTDs could be combined but only within large organisations
who could pay for elaborate tools. Interoperability with third parties was
not supported.
[A lemma was that large academic projects like TEI might also manage this.]

There is therefore no 'golden age' of multi-DTD working that the current
namespace proposal breaks. The key aspects of the current proposal are:
	- software must be widely (freely) available
	- recipients of multi-namespace documents must be able to use generic tools 
	- it should be possible to combine chunks of information (information
components) in a smallish variety of simple ways. The more complex the
suggestions, the harder they will be to implement.
	- global validation for a multinamespace document is likely to be
difficult if authors are allowed flexibility in the way these are combined.
Validation for a very rigid document type should not be difficult.

There seem to be the following ways forward:

(A) SUBDOC-like (islands of validity). I'm not familiar with SUBDOC but I
guess this requires software that can identify a subcomponent (probably a
subtree) in a document and start up a validating parser for that component
alone. 

(B) Architectural forms. AFAIK the new namespace proposal neither supports
nor invalidates AFs. I assume (and hope) that their adherents can build
prototype systems that will work with namespaced documents.

(C) XLink (Tim Bray's - and my - preferred solution). The information
components are kept in separate files and are transcluded rather than
included in documents. Parsing, including validation, is a separate
activity for each file. There is clearly a requirement for software to
manage the identification and reassembly of these subtrees. We shall also
need experience in how documents should be authored. I don't always like
the idea that a document with 100 molecules has to have 100 separate file
using XLink (though it provides excellent normalisation). I'd feel happier
if I could be sure that the linking mechanism always found the right file.

(D) Schemas. I am delighted by the progress that the XSC group has made
over the last few days - they are clearly optimistic that NS's can be
fitted to XSchema. That's great. If nothing else, this should be a superb
basis for future schema work - either  expanding XSchema or
XML-data/W3C-based.

A word of caution. I thought that XSchema would take a day or two. It's
taken 2-3 months. That's with a high rate of activity. So we should accept
that namespace APIs/schemas etc will take a month or so to build.

I know it's a slack time of year but it's catalysed (in my mind) by the XML
DevCon in Montreal in 12 days time. It would be nice to get a feeling as to
what activities people are thinking of for implementing namespaces. So far
we have had a few suggestions, particularly James Clark, John Cowan, David
Megginson and the XSchema group. If DavidM reads this, it might be useful
if he can suggest how SAX might react to namespaces...

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)