Well-formed vs. valid

Fri Feb 26 18:49:40 GMT 1999

>>FYI, our (IBM's) new version 2 architecture parsers do this. We have a
>>pluggable architecture, and one of the plug ins is a validator. The low
>>level scanner uses this to validate content before it sends it out
through
>>the internal even APIs. So, if you are wiring together a SAX style
parser,
>>you just wire the internal events to the SAX events and you have a
>>validating SAX parser (actually we have that combination already provided
>>for you as a canned parser, but you can do other variations as well.)
>
>Big question: can I plug someone else's SAX parser into your scanner, and
>then have your validation component work on my SAX events?  While it's
>unlikely that I'd want to plug a different SAX parser in, it's quite
>possible that I'd want to work with the SAX events (transforming with XT,
>for instance) before performing validation.
>

You can, its just less efficient. The validators have to support
're-validation' or 'after the fact' validation, whatever you want to call
it (e.g. revalidating a modified DOM tree.) Its just that, internally and
in a DOM that we write for our parser specifically, we can take advantage
of info that will significantly speed up the process. Once its passed
through to the outside world (via some general API that cannot pass on our
information) and hence only the element names exist, the validator has more
work to do to do the validation, but it does work.

For an event API, you will have to maintain an 'element stack' in order to
gather up the info required to do the revalidation (a DOM tree already
inherently represents that.) Its just a simple push down stack of the
elements along the current nesting hierarchy, and the children of those
elements. When you get to the end of an element, call the validator with
the child list, then pop that top element off and go back to working on the
previous one.

The low level scanner (while parsing) maintains a stack like this for
validation, though it only has to maintain numbers, not names. If you use
our internal event API, you can also store numbers for revalidation (as
would a DOM written specifically for our system.) As you can imagine, just
doing number comparisons is much faster than doing hashed string
comparisons. Of course that's not to say that you cannot maintain a string
pool yourself and really only store numbers in your stack (for speed) and
then just get the element name text references when its time to validate.
But that's still not as fast as using our numbers, since they already exist
and the validator knows the element content models in terms of those
numbers.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)