Well-formed vs. valid

Wed Feb 24 02:29:10 GMT 1999

Writing application code for validation is something I agree is being 
done and is something to avoid. The validation code is just another 
incarnation of the information in a DTD, i.e. both the DTD and the 
code in the application detect a valid document.

When the structure of the content changes, 2 dissimilar descriptions 
must change - the DTD and the application. Neither SAX nor DOM provide 
any means to deal with this problem - one provides a stream of element 
creation calls and the other provides walking the tree to access 
elements.

I would propose a type of XML parser that takes a well-formed or valid 
document, validates it against a DTD (or any other accepted form of 
structure description) of the application's choice, and then issues 
streaming events to the application. Consider it a DOM that does a 
tree match on an application chosen DTD and then emits SAX calls. The 
application would be guaranteed to be receiving valid elements and 
thus not need its own data validation code.

The line between the application and 'XML' is currently viewed as the 
application is hooked onto DOM, SAX, or some other XML parser of a 
file at the level of elements. The XML structural description in a DTD 
is not used, except if the document (not the application) calls for 
validation. This separation is also represented by modeling on the 
basis of a file rather than a stream.

This 'traditional' architecture (file-based, DTD for optionally 
ensuring file is valid) both limits the capabilities and requires 
writing of lots of additional application code for verification and 
other purposes.

By allowing a stream rather than file model to be used, good things 
can be accomplished:
1.	A site can advertise its available content with a DTD. A DTD not 
only describes valid form, but also the entire world of what a server 
may provide.
2.	An application can decide what elements out of the available 
elements of a site are needed (via query or pattern to site) which 
would then respond with the desired content. Extraneous elements could 
be avoided by the application's choice.

Rather than consider a site a mere file that can be downloaded in its 
entirety and providing yet another means to query a site for its 
available documents, the site can become an element server which 
advertises its elements and cooperates with the application to 
download only the needed elements.

The concept of 'valid' under this model is more of a 'not invalid' - 
if the stream so far is valid, assume it will continue to be. Only 
closing the stream would deliver the various closing elements which 
(hopefully) would result in a complete valid document.

It's easy enough to fall back onto a 1960s model of communication (the 
file) and punt the validation problems onto the application writers, 
but for widespread acceptance things need to be easy not difficult.

Another 10 cents worth of thought into the pot,

Marc B McDonald
Principal Software Scientist
Design Intelligence, Inc
www.design-intelligence.com

----------
From:  Tim Bray [SMTP:tbray at textuality.com]
Sent:  Monday, February 22, 1999 11:21 AM
To:  Jeffrey E. Sussna; 'XML-DEV'
Subject:  Re: Well-formed vs. valid

At 10:58 AM 2/22/99 -0800, Jeffrey E. Sussna wrote:
>One thing disturbs me, however. Much talk seems to be made about 
documents
>or document fragments being useful because they are well-formed. I 
don't
>want something well-formed, I want something "valid". Whether 
validity is
>determined by reference to a DTD or to a schema of some other kind, I 
need
>more than just the lowest-level syntactic conformance to the XML 
spec. I
>need to be able to determine that the XML in question conforms to the 
>syntactic and semantic constraints imposed by my application.

I've never seen an application so simple that its syntactic/semantic
constraints could be expressed in a schema, DTD or any other flavor.
That's why every commercial DBMS-based app has zillions of lines of
data validation code that have to be run before you actually use
incoming data.

Having said that, I think that validation is a good thing and
essential in lots of applications, and will become a better thing
once we have a more modern schema facility.

>Furthermore, I don't want to have to rely on implicit knowledge 
contained
>within a proprietary parser in order to do so.

In my experience, you *always* have to write some 
application-specific
validation code. -Tim

xml-dev: A list for W3C XML Developers. To post, 
mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on 
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following 
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)