Layers, again (was Re: fixing (just) namespaces and validation)

Wed Sep 8 20:08:38 BST 1999

This continues to be interesting, but I'm afraid you're convincing me more
and more that your layered approach is grotesque, inflexible, and liable to
tilt, rather than a good way to spare developers aspirin.

At 01:02 PM 9/8/99 -0400, David Megginson wrote:
>Oops!  Note that I didn't say "what Namespaces do" -- that's
>application-specific.  I referred only to the parts of XML 1.0 syntax
>that act as links to the Namespace layer (Namespace declarations and
>prefixes).  So, for
>
>  "what Namespace declarations and prefixes do" 
>
>try reading
>
>  "how expanded names are constructed from Namespace declarations and
>  prefixes".
>
>In other words, given any XML 1.0 document, the XML Namespaces REC
>precisely and unambiguously specifies how to determine the expanded
>name of every element and attribute in the document.  Thats *all* that
>you need to move from the XML 1.0 layer to the Namespaces layer during
>processing.

Again, you're assuming that everyone wants to treat Namespaces as a layer
that happens _after_ XML parsing (incl validation) is complete.  I don't
want to move from XML 1.0 to Namespaces - I want to address Namespaces
within the context of XML 1.0 validation.  There is nothing in either spec
that requires that they be treated as separate layers, by my reading.  On
the other hand, breaking XML 1.0 into intelligible layers and integrating
Namespaces as a layer within that structure seems like a worthy project.

>But there's no reason that the parser has to finish processing the XML
>1.0 layer before it starts evaluating the Namespace layer -- that's a
>matter of physical implementation, and the layers belong to a logical
>model (jumping layers is always wrong in logical models, but it often
>makes sense in implementations: note how many routers know about HTTP,
>even though they're technically dealing with the IP layer).  In the
>SAX Namespaces filter for DATAX, for example, the Namespace processing
>is done on the fly.

See above.  The question is not whether I have to finish processing the
entire document with the parser before I begin namespace processing - it's
whether I can integrate namespace processing with XML 1.0 processing. Where
does the layer belong?  I don't find Namespaces compelling as a 'logical
layer' of their own - rather, I see their existence in a separate document
as a historical accident.

>More seriously, neither XML nor Namespaces currently has a
>standardized processing model of any sort, and I'm still trying to
>decide whether I think they should.  The data models and APIs that we
>have specify only what should be available *after* processing; they
>don't say how we get there.

Precisely.  And they don't say what layer goes where, either.  You read the
current situation as opening one set of possibilities, and I see it opening
a very different set of possibilities.

> > The problem in both of these examples is that you treat XML itself
> > as monolithic, and DTD validation as a tool that can only be used
> > at the time of parsing.  As a result, we have multiple levels of
> > checking that have to be redundant if they're done at all.  Check
> > against schemas, DTDs, _and_ RDF? And then throw application rules
> > on top of that? Forget it.
>
>No, you have the choice of applying validation only at the levels that
>are important to you: if you're producing XHTML for display in legacy
>browsers, then you might need to validate the character layer with
>regular expressions to ensure that empty-element tags always have a
>space before the closing delimiter (<hr />, not <hr/>).

No, you don't have the choice of applying _validation_ at whatever level
you like - currently, you have the choice of applying it or not applying it
during the parser. You're applying the word validation in a much more
general sense here, ignoring the fact that DTD-based validation, which is
capable of addressing a significant range of problems _today_, is locked in
a box with the rest of XML 1.0 processing.

>You can apply validation to *any* layer of processing, from the raw
>bytes to the final application.  Choosing where to validate is an
>architectural decision, not a standards one.  It just happens that
>some of the layers do have shared specs that make validation easier
>(regular expressions, DTDs, and RDF schemas), while others do not,
>yet.

Again, you're using validation to mean anything you want here.  Choosing
where and when to validate is as much a decision about which tools are
available as it is about the logical model you like and which I find so
illogical.

> > These 'layers' are pretty much a guarantee that developers either
> > need to make an investment in large quantities of aspirin - or pick
> > one tool and stick to it.
>
>No, that's wrong -- layered approaches like these have proven
>themselves over and over (the Internet is only the most famous
>example).  Do you consider it redundant that both TCP and HTTP perform 
>different kinds of validity checks?

TCP and HTTP do well with their different checks.  (And UDP is available
for those who like to cut down redundancy.)  But DTDs perform a subset of
the validity checks of schemas, while RDF schemas provide an overlapping
set.  That sort of redundancy I find merely redundant.  Your layers aren't
performing tasks which are different enough to justify calling them
separate 'logical' tasks.

> > If I thought that schemas would be here soon, or that RDF really
> > was the answer to all of these, I wouldn't be pushing on DTD
> > validation.
>
>But you need all of these and more.  Any higher-level layer will have
>its own constraints that cannot (and should not) be expressed in a
>generic XML structural schema language: read the TEI spec (for
>example) to see how complex these constraints can be.

I may need all of these, someday, for certain types of projects.  I do not
believe that I will need to use DTDs, schemas, and RDF all on the same
processing run of a document.  I may need to be able to convert among them
for different processors, but stacking all of them (DTDs and schemas in
particular) seems foolish at best.  

James Clark's note on the problems of DTDs and their effective uselessness
is much more believable to me than a claim that I'll need to use both forms
in some kind of layered processing.  (Given that the whole thing will fold
at validation if I change a prefix anyway, I can't imagine why I'd bother.)

>DTDs do two things very well -- they let you validate the surface
>structure of an XML 1.0 document, and they provide production rules to
>help with the creation of XML 1.0 documents.  They could be extended
>to do lots of other kinds of things, but I hardly see the point (ISO
>8859-1 could have been extended to include markup, for example, but it
>would have been a bad idea).

I don't think we're talking about an enormous extension ala schemas - we're
talking about integrating validation with qualified names.  It's not an
enormous leap.

> > DTDs do seem to be a good answer - in the short term for many
> > projects, in the long term for a subset of projects - to the need
> > for structural checking.  It doesn't seem that ridiculous to want
> > to 'validate' the results of a transformation (generated via XSL or
> > the DOM) or to want to 'validate' a document against a DTD
> > structure while taking into account namespaces.
>
>Not at all, but you need something other than DTDs to do that, at
>least for now.

Why?  Because XML 1.0 was written as a monolithic spec and the XML
Namespaces rec didn't feel it was worth the time?  This is not a difficult
problem to address, solving real needs now.  We don't have _anything_ to do
that with now.

> > Because XML 1.0 was written so that everything from character
> > checking to entity replacement to attribute defaulting to
> > structural inspections (DTD and otherwise) are all performed by one
> > monolithic 'parser', we haven't been able to describe XML
> > processors with any level of granularity.  When I talk about layers
> > (for instance, in
> > http://www.simonstl.com/articles/layering/layered.htm), it's
> > layering for the sake of breaking things into the smallest usable
> > components, not for the sake of piling on more and more mostly
> > redundant processing.  Your layer 3 is way too thick.
>
>All of the layers are way too thick -- that's the joy of a high-level
>logical model.  I can break any one of them down into dozens of
>smaller layers; in fact, the application layer will often be much more
>complicated than the parser layer, simply because useful applications
>generally do complicated things.

Then maybe we'd better take a closer look at the layers you propose -
piling many thick layers on top of each other doesn't sound like a very
good recipe.

> > If treat validation as a process with its own life, outside of the
> > Rube Goldberg machine known as an XML processor, we might be able
> > to solve a lot of problems that currently look very difficult much
> > more simply.  Namespaces included.
>
>It's wonderful to see that we end up agreeing.  Validation is a much
>bigger problem than DTDS -- it's best to think of DTDs as a small
>bonus (you can perform some types of structural validation on the XML
>layer right now basically for free) rather than a liability, and to
>think of the greater validation problem as still unsolved in the
>general case.  That's not because the XML Schema committee members are 
>stupid or obstructionist, but simply because validation in the general 
>case is a *very* hard problem.

Validation in general is a very large problem, and I'm not trying to solve
it all at once.  I'm proposing that we fix some tools we have today so that
they work better with other tools we have today.

Simon St.Laurent
XML: A Primer (2nd Ed - September)
Building XML Applications
Inside XML DTDs: Scientific and Technical
Sharing Bandwidth / Cookies
http://www.simonstl.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)