ModSAX: Proposed Core Features (heretical?)

Bill la Forge b.laforge at
Mon Mar 15 19:15:00 GMT 1999

From: Simon St.Laurent <simonstl at>
>Basically, he wanted the ability to check the document structure without
>the internal subset, so he could rely on the validation process to make
>certain that documents conformed to an 'official' DTD, without extra junk
>some twerpy developer put in the internal subset to make his own version
>valid if not official.

But even given that an 'official' DTD was used, there is a question as to
WHICH official DTD was used. I see several problems with relying on
an unaugmented SAX parser for validation of data being input to an application:

1. DTD-driven validation is rarely complete enough--there will always be
    something critical that the application needs to validate. Fortunately,
    SAX supports parse exceptions in all the right places, with full information
    available on where in the document the error occurred.

2. If the application is going to depend on the parser for some of the validation
    (a real boon to the application programmer), then the application needs
    to be informed by the parser as to which DTD or other schema was used.

    Having the document specify this information in a PI or by some other means is
    not sufficient unless that information is somehow compared to the DTD 
    actually used by the parser.

3. As mentioned by Simon, allowing an author to change a DTD makes no
    sense at all in terms of providing a validation service for the application.

4. When filters are placed between the parser and the application, validation is
    best done in the last filter, rather than prior to the transformations performed
    by those filters. Validation by the parser in this case may produce clearer
    error messages, but validation of the transformed data provides the application
    with a greater assurance that its data will be in the expected form.

My belief here is that it is perhaps best to abandon validation by the parser-
kernel and instead use filters which support the validation needs of the
application. Errors so detected may be because of a poorly constructed document,
but may also be due to constraints imposed by a particular application. This
of course raises the question of how the response to these two different types of
errors should differ. I can understand a desire to make such a distinction, but
I have not yet come to appreciate the need to make such a distinction.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list