ModSAX: Proposed Core Features (heretical?)

roddey at us.ibm.com roddey at us.ibm.com
Tue Mar 16 19:06:46 GMT 1999




<Bill's Comment>
My belief here is that it is perhaps best to abandon validation by the
parser-
kernel and instead use filters which support the validation needs of the
application. Errors so detected may be because of a poorly constructed
document,
but may also be due to constraints imposed by a particular application.
This
of course raises the question of how the response to these two different
types of
errors should differ. I can understand a desire to make such a distinction,
but
I have not yet come to appreciate the need to make such a distinction.

Bill
</Bill's Comment>

That would have some pretty large performance implications. For our new
generation parsers, we can validate the event stream *very* fast as its
going out of the parser. Doing it after the fact, way up stream, would be
much, much slower. I could imagine that this would be true of other parsers
as well, that once the stuff has gone out into the 'real world', validation
becomes much more work becuase now it has to be in terms of text
comparisons instead of internal element ids.

I understand that the filter sequence could change the document, but
wouldn't it be just as important to know that it died because the original
document was hosed (and therefore the filters spat out junk)?

Another option, though also frighteningly bad for performance and requiring
compliance by parsers, would be way to plug filters in 'under' the input
into the parser. That way, the filtering would happen before the parser
passed judgement on it for either well-formedness or validity. Otherwise,
even if you validate the end product of the filter sequence, how do you
know it remained well-formed? That check is usually implicit in the call
stack of the parser on the original content. But, of course, how would the
filters operate on the text that has not been parsed yet? :-)

Its almost like you'd want to parse it lightly to get it to the filters,
let them shake and bake it, then parser it for real and validate it. But
that would be really rough for performance also.

For just the scenario where an application wants normal wf/valid checks but
needs to add more to it, that's an obvious application of a validation
filter that wouldn't be a performance pig hopefully. I think that makes a
lot of sense. But if the content gets changed along the way, knowing
whether it got hosed in the process seems important.

Anyway, that ramble seemed incoherent even to me, but I think there was a
point in there somewhere :-)



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list