Revelling parser writers (was Rebelling)
peter at ursus.demon.co.uk
Fri Nov 28 01:33:28 GMT 1997
JUMBO now has an interface to 3.5 parsers including Lark and NXP. This
means that the user can parse the same document with different parsers or
can (in principle) use a different parser for the initial document than for
the XML-LINKed ones (I haven't actually include a 'Change Parsers' button.
It has been 'quite easy'. Authors have generally provided a set of test
routines to be either hacked or subclassed (see Lark for examples.) I think
this is a good model for distribution, as it's a quite way to make minor
changes and get them hooked into your system. It shouldn't take more than
about 2 hours per parser - I can't spare more.
I have not done the MSXML system because I don't know if it has been
WORA'ed yet... have I missed it?
JUMBO may not be a complete test bed as it builds a tree and can then do
things from that. It may lose information (it doesn't store comments at
present). Since it was written before the WG decided on joined-up writing
for XML names, it still uppercases everything and I'm waiting for the white
smoke before I make that change. It *does* store PIs as children of the
immediately preceding non-PCDATA Element. It does not store NOTATIONs as
it has never seen one and doesn't know what to do with one when it gets it.
It is also not very good on things like IMPLIED attribute values since it
may not always have a DTD. If anyone can come up with simple rules for what
a tree should contain, that could be useful. [Not a grove at this stage, as
no one seems to write their parsers to create groves.]
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev