Do I need to use a validating parser?
David Megginson
david at megginson.com
Tue May 4 17:38:02 BST 1999
Joshua E. Smith writes:
> First, an easy one: If a language is defined in XML, you would say that
> language is XML-______. Compliant? Derived? ish? ey? Compatible?
Conformant.
> Now the harder ones...
>
> I'm trying to choose a parser to use in my plugin. I see a choice
> between expat, which is non-validating, and will increase my
> download size by <100K, and SP which is validating, and will
> increase my download size by about 1MB. Gak.
Or you could use Java -- Microstar's non-validating AElfred parser, in
a compressed JAR, will increase your download size by only about 15K
(plus another 5 or 6K if you use the SAX interfaces).
> I suppose that the validation should really be done a priori by the
> content developer using a validating editor, so doing validation in
> my plugin is really unnecessary. Is that true?
It's your call. Remember that XML Validation means only rudimentary
structural validation anyway.
> Do commercial validating XML editors exist yet?
PSGML for Emacs is DTD-driven and free, but will scare away any user
who'd be too nervous, say, to install and use Linux. WordPerfect 9.0
has a DTD-driven XML editor built in, and XMetaL from SoftQuad is, I
assume, DTD-driven as well. As far as I know, WP 9 and XMetaL are
still in beta, and I don't know about the release dates for products
from ArborText.
There are also some editors that simply use a tree widget to provide
an unformatted view of the document -- I haven't kept track of those,
but they might be useful for some applications.
> I also suppose that my DTD is going to be pretty big by the time
> it's done,
Possibly not -- it depends on how complex your DTD is.
> and downloading it every time someone wants to use my plugin is kind of
> stupid. Right? So that's another reason NOT to use a validating parser.
A non-validating parser may *still* download the DTD (AElfred does,
for example) if you provide a pointer to one.
> I think I'm starting to understand why they went to the trouble of
> distinguishing well-formed from valid.
They're not separate: valid is a subset of well-formed, not an
alternative to it.
> But in my reading [XML Specification Guide: Graham, Quin, 1999], it
> appears that non-validating parsers are allowed to ignore tons of stuff.
> Is there ANY documentation of what expat actually *does*? (For that
> matter, is there any documentation at all?) I assume it ignores external
> entities, right? That means I can't rely on putting boilerplate (think C
> #include files) into an external parsed general entity if I go with expat,
> right?
Or you can preprocess on the server side to expand entities, insert
defaulted attribute values, etc.
> If you were using a programming language which is XML-ish, what XML
> features would you be annoyed to see left out (substitution of entities is
> an obvious one, which I've seen 3DML slammed for)?
I find it very hard to imagine coding in a Turing-complete
programming language that is XML-ish -- markup languages are usually
quite clumsy for representing programming languages.
What exactly do you mean, here?
> Of those features, which does expat not do (and therefore I'll have
> to do in my application, or extend expat to do -- three cheers for
> open source!)?
Expat does not validate the document or expand external text entities
(including the external DTD subset, if any).
> I'd rather not have any DLLs hanging around with my plugin -- do
> any of you have experience linking xmltok and xmlparse statically
> under Win32? Any surprises, or tricks I need to know about?
The easy solution is to reformat the hard drive and install Linux, but
then you wouldn't be able to play Flight Simulator any more (you could
play DOOM and Quake, though) -- personally, I keep a small Windows
partition for games when I'm not working.
All the best,
David
--
David Megginson david at megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list