Layers, again (was Re: fixing (just) namespaces and validation)

Wed Sep 8 21:54:31 BST 1999

Simon St.Laurent writes:

 > Again, you're assuming that everyone wants to treat Namespaces as a
 > layer that happens _after_ XML parsing (incl validation) is
 > complete.  I don't want to move from XML 1.0 to Namespaces - I want
 > to address Namespaces within the context of XML 1.0 validation.

In other words, you want to deal exclusively with the Namespaces
layer: I think that's a good idea for most applications, and it's
certainly necessary for XSL and RDF; it just happens that there's not
a standard schema for validating that layer yet (because it's so
new).  Personally, I would not mind deprecating the XML 1.0 layer so
that we can remove it in a few years.

Some applications, however, *do* care about the XML 1.0 layer
(i.e. they care about what specific prefix you use) -- again, XHTML
for legacy browsers is a very good example -- and DTDs are well suited
for that work.  See further, below.

My point is simply that it's not fair to complain that DTDs don't work
with the Namespaces layer, any more than it's fair to complain that
your screwdriver handle gets dented when you hit nails with it.

 > See above.  The question is not whether I have to finish processing
 > the entire document with the parser before I begin namespace
 > processing - it's whether I can integrate namespace processing with
 > XML 1.0 processing. Where does the layer belong?  I don't find
 > Namespaces compelling as a 'logical layer' of their own - rather, I
 > see their existence in a separate document as a historical
 > accident.

It is an accident, partly, but that's the way that technology
develops.  We have to deal with the fact that there is much XML
software that doesn't know about Namespaces, and it will likely be
deployed for a long time.  That means that some applications will care
about the XML 1.0 level for the foreseeable future.  As a result,
there are de facto two separate layers.

Ideally, the number of applications that care will diminish over
time.

 [snip]

 > No, you don't have the choice of applying _validation_ at whatever
 > level you like - currently, you have the choice of applying it or
 > not applying it during the parser. You're applying the word
 > validation in a much more general sense here, ignoring the fact
 > that DTD-based validation, which is capable of addressing a
 > significant range of problems _today_, is locked in a box with the
 > rest of XML 1.0 processing.

Validation is a big problem in software design, and even in the SGML
world DTD validation covered at best a tiny subset of it.  It might be 
useful to distinguish the following terms to avoid confusion:

validation: the act of ensuring that data conform to a set of known
   rules.

XML validation: validation where the data are all or part of an XML 
   document.

DTD validation: XML validation using a DTD, as defined in the XML 1.0
  specification.

The first term applies to any layer in any system that exchanges
information; most programmers who deal with validation problems have
never even heard of XML.  An RDF schema (for example) provides
validation of the RDF data model, not of the XML markup (as a proof,
it can be applied after all of the original markup distinctions have
been removed).  A Java interface, to give another example, is a schema
that the parser uses to validate a Java implementation.

The second, more specialised term applies to any kind of validation
performed on an XML document *as XML* -- it is broad enough to embrace
both the XML 1.0 layer and the Namespaces layer.  The XML schema
effort is aimed at providing a general mechanism for XML validation,
but not at providing DTD validation.

The third term applies to the specific set of rules and constraints in 
the XML 1.0 recommendation, where the target is an XML document and
the rules are expressed in a DTD.  DTD validation applies only to the
XML 1.0 layer.

Most validation in software systems is done by custom code; in some
cases, there are higher level constructs (like DTDs or BNF) that can
help.

 > I may need all of these, someday, for certain types of projects.  I do not
 > believe that I will need to use DTDs, schemas, and RDF all on the same
 > processing run of a document.

I hope that you won't, but it's not hard to think of situations where
you would.

 > Why?  Because XML 1.0 was written as a monolithic spec and the XML
 > Namespaces rec didn't feel it was worth the time?  This is not a difficult
 > problem to address, solving real needs now.  We don't have _anything_ to do
 > that with now.

It's not a question of being worth the time (God knows how many
person-hours we put into debating and designing Namespaces REC, but it
must be at least 10 person hours per word) -- it's a question of not
breaking XML 1.0.  We deliberately haven't fiddled with XML during the
last year and a half -- it's been stable so that the companies
investing hundreds of millions of R&D dollars into XML don't see their
software become obsolete two weeks after (or before) release.

This approach wins us a lot of confidence in the corporate world, and
it's one of the biggest secrets of XML's success: every document
containing Namespaces can still be processed by an XML 1.0 processor
that knows nothing about Namespaces; processors that do know about
Namespaces can do additional kinds of value-added processing.

 [snip]

 > >All of the layers are way too thick -- that's the joy of a
 > >high-level logical model.  I can break any one of them down into
 > >dozens of smaller layers; in fact, the application layer will
 > >often be much more complicated than the parser layer, simply
 > >because useful applications generally do complicated things.
 > 
 > Then maybe we'd better take a closer look at the layers you propose -
 > piling many thick layers on top of each other doesn't sound like a very
 > good recipe.

I think that you've misunderstood.  Each thick layer is an abstraction
of many thinner layers -- that's the way that high-level models work
(just as each folder in a file system may contain many other folders,
etc.).  Remember that a model is an abstracted explanation of a system
design, not the system itself.

All the best,

David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)