Well-formed vs. valid
Marc.McDonald at Design-Intelligence.com
Marc.McDonald at Design-Intelligence.com
Wed Feb 24 02:29:10 GMT 1999
Writing application code for validation is something I agree is being
done and is something to avoid. The validation code is just another
incarnation of the information in a DTD, i.e. both the DTD and the
code in the application detect a valid document.
When the structure of the content changes, 2 dissimilar descriptions
must change - the DTD and the application. Neither SAX nor DOM provide
any means to deal with this problem - one provides a stream of element
creation calls and the other provides walking the tree to access
elements.
I would propose a type of XML parser that takes a well-formed or valid
document, validates it against a DTD (or any other accepted form of
structure description) of the application's choice, and then issues
streaming events to the application. Consider it a DOM that does a
tree match on an application chosen DTD and then emits SAX calls. The
application would be guaranteed to be receiving valid elements and
thus not need its own data validation code.
The line between the application and 'XML' is currently viewed as the
application is hooked onto DOM, SAX, or some other XML parser of a
file at the level of elements. The XML structural description in a DTD
is not used, except if the document (not the application) calls for
validation. This separation is also represented by modeling on the
basis of a file rather than a stream.
This 'traditional' architecture (file-based, DTD for optionally
ensuring file is valid) both limits the capabilities and requires
writing of lots of additional application code for verification and
other purposes.
By allowing a stream rather than file model to be used, good things
can be accomplished:
1. A site can advertise its available content with a DTD. A DTD not
only describes valid form, but also the entire world of what a server
may provide.
2. An application can decide what elements out of the available
elements of a site are needed (via query or pattern to site) which
would then respond with the desired content. Extraneous elements could
be avoided by the application's choice.
Rather than consider a site a mere file that can be downloaded in its
entirety and providing yet another means to query a site for its
available documents, the site can become an element server which
advertises its elements and cooperates with the application to
download only the needed elements.
The concept of 'valid' under this model is more of a 'not invalid' -
if the stream so far is valid, assume it will continue to be. Only
closing the stream would deliver the various closing elements which
(hopefully) would result in a complete valid document.
It's easy enough to fall back onto a 1960s model of communication (the
file) and punt the validation problems onto the application writers,
but for widespread acceptance things need to be easy not difficult.
Another 10 cents worth of thought into the pot,
Marc B McDonald
Principal Software Scientist
Design Intelligence, Inc
www.design-intelligence.com
----------
From: Tim Bray [SMTP:tbray at textuality.com]
Sent: Monday, February 22, 1999 11:21 AM
To: Jeffrey E. Sussna; 'XML-DEV'
Subject: Re: Well-formed vs. valid
At 10:58 AM 2/22/99 -0800, Jeffrey E. Sussna wrote:
>One thing disturbs me, however. Much talk seems to be made about
documents
>or document fragments being useful because they are well-formed. I
don't
>want something well-formed, I want something "valid". Whether
validity is
>determined by reference to a DTD or to a schema of some other kind, I
need
>more than just the lowest-level syntactic conformance to the XML
spec. I
>need to be able to determine that the XML in question conforms to the
>syntactic and semantic constraints imposed by my application.
I've never seen an application so simple that its syntactic/semantic
constraints could be expressed in a schema, DTD or any other flavor.
That's why every commercial DBMS-based app has zillions of lines of
data validation code that have to be run before you actually use
incoming data.
Having said that, I think that validation is a good thing and
essential in lots of applications, and will become a better thing
once we have a more modern schema facility.
>Furthermore, I don't want to have to rely on implicit knowledge
contained
>within a proprietary parser in order to do so.
In my experience, you *always* have to write some
application-specific
validation code. -Tim
xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list