Layers, again (was Re: fixing (just) namespaces and validation)

Wed Sep 8 19:01:01 BST 1999

Simon St.Laurent writes:

 > At 08:31 AM 9/8/99 -0400, David Megginson wrote:

 > >David Carlisle writes:
 > >
 > > > Yes, agreed, it wasn't really a criticism. The fact remains
 > > > that at the current time the `problem' is that there is no
 > > > standard way of getting from one layer to the other.
 > >
 > >Sure there is -- at least, the Namespaces REC defines pretty clearly
 > >what Namespace declarations and prefixes do.
 > 
 > I think you've read considerably more into the Namespaces REC than
 > I as far as _when_ that 'namespace doing' takes place.

Oops!  Note that I didn't say "what Namespaces do" -- that's
application-specific.  I referred only to the parts of XML 1.0 syntax
that act as links to the Namespace layer (Namespace declarations and
prefixes).  So, for

  "what Namespace declarations and prefixes do" 

try reading

  "how expanded names are constructed from Namespace declarations and
  prefixes".

In other words, given any XML 1.0 document, the XML Namespaces REC
precisely and unambiguously specifies how to determine the expanded
name of every element and attribute in the document.  Thats *all* that
you need to move from the XML 1.0 layer to the Namespaces layer during
processing.

 > While it does discuss the appearance of qualified names in DTDs and
 > makes certain comments regarding the non-reliability of attribute
 > defaulting in non-validating parsers, it doesn't go further.  It
 > doesn't specify explicitly that Namespace processing is performed
 > as a layer between the application and the parser, or that all
 > parser operation must be completed before namespace processing
 > begins.

But there's no reason that the parser has to finish processing the XML
1.0 layer before it starts evaluating the Namespace layer -- that's a
matter of physical implementation, and the layers belong to a logical
model (jumping layers is always wrong in logical models, but it often
makes sense in implementations: note how many routers know about HTTP,
even though they're technically dealing with the IP layer).  In the
SAX Namespaces filter for DATAX, for example, the Namespace processing
is done on the fly.

More seriously, neither XML nor Namespaces currently has a
standardized processing model of any sort, and I'm still trying to
decide whether I think they should.  The data models and APIs that we
have specify only what should be available *after* processing; they
don't say how we get there.

 > The problem in both of these examples is that you treat XML itself
 > as monolithic, and DTD validation as a tool that can only be used
 > at the time of parsing.  As a result, we have multiple levels of
 > checking that have to be redundant if they're done at all.  Check
 > against schemas, DTDs, _and_ RDF? And then throw application rules
 > on top of that? Forget it.

No, you have the choice of applying validation only at the levels that
are important to you: if you're producing XHTML for display in legacy
browsers, then you might need to validate the character layer with
regular expressions to ensure that empty-element tags always have a
space before the closing delimiter (<hr />, not <hr/>).

You can apply validation to *any* layer of processing, from the raw
bytes to the final application.  Choosing where to validate is an
architectural decision, not a standards one.  It just happens that
some of the layers do have shared specs that make validation easier
(regular expressions, DTDs, and RDF schemas), while others do not,
yet.

 > These 'layers' are pretty much a guarantee that developers either
 > need to make an investment in large quantities of aspirin - or pick
 > one tool and stick to it.

No, that's wrong -- layered approaches like these have proven
themselves over and over (the Internet is only the most famous
example).  Do you consider it redundant that both TCP and HTTP perform 
different kinds of validity checks?

 > If I thought that schemas would be here soon, or that RDF really
 > was the answer to all of these, I wouldn't be pushing on DTD
 > validation.

But you need all of these and more.  Any higher-level layer will have
its own constraints that cannot (and should not) be expressed in a
generic XML structural schema language: read the TEI spec (for
example) to see how complex these constraints can be.

DTDs do two things very well -- they let you validate the surface
structure of an XML 1.0 document, and they provide production rules to
help with the creation of XML 1.0 documents.  They could be extended
to do lots of other kinds of things, but I hardly see the point (ISO
8859-1 could have been extended to include markup, for example, but it
would have been a bad idea).

 > DTDs do seem to be a good answer - in the short term for many
 > projects, in the long term for a subset of projects - to the need
 > for structural checking.  It doesn't seem that ridiculous to want
 > to 'validate' the results of a transformation (generated via XSL or
 > the DOM) or to want to 'validate' a document against a DTD
 > structure while taking into account namespaces.

Not at all, but you need something other than DTDs to do that, at
least for now.

 > Because XML 1.0 was written so that everything from character
 > checking to entity replacement to attribute defaulting to
 > structural inspections (DTD and otherwise) are all performed by one
 > monolithic 'parser', we haven't been able to describe XML
 > processors with any level of granularity.  When I talk about layers
 > (for instance, in
 > http://www.simonstl.com/articles/layering/layered.htm), it's
 > layering for the sake of breaking things into the smallest usable
 > components, not for the sake of piling on more and more mostly
 > redundant processing.  Your layer 3 is way too thick.

All of the layers are way too thick -- that's the joy of a high-level
logical model.  I can break any one of them down into dozens of
smaller layers; in fact, the application layer will often be much more
complicated than the parser layer, simply because useful applications
generally do complicated things.

 > If treat validation as a process with its own life, outside of the
 > Rube Goldberg machine known as an XML processor, we might be able
 > to solve a lot of problems that currently look very difficult much
 > more simply.  Namespaces included.

It's wonderful to see that we end up agreeing.  Validation is a much
bigger problem than DTDS -- it's best to think of DTDs as a small
bonus (you can perform some types of structural validation on the XML
layer right now basically for free) rather than a liability, and to
think of the greater validation problem as still unsolved in the
general case.  That's not because the XML Schema committee members are 
stupid or obstructionist, but simply because validation in the general 
case is a *very* hard problem.

All the best,

David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)