program to restore end tags?

Christopher R. Maden crism at exemplary.net
Thu Nov 25 08:41:53 GMT 1999


[Sean Chen]
>I really don't want to parse SGML fully, as it's not needed for an
>SGML->XML converter.  I just want to assume concrete syntax, leave the
>entities alone, and the more esoteric SGML documents I might encounter
>I'll fob off to SX or something more industrial strength.

I think you'll have a sort of halting problem, unless you're dealing with a
known set of documents.

For a start, it's really hard to add end-tags without parsing the DTD, so
you're most of the way to a full parser right there.  But you won't know if
the concrete syntax is different unless you parse the SGML declaration; you
won't know what shortrefs may be in effect unless you parse the DTD; you
won't know what could be inside an entity (since SGML doesn't have XML's
guarantee that any entity will be an integer of elements); you won't know
what the value of any marked section keyword parameter entities are unless
you've parsed the DTD; etc.  In other words, I think you'll need a full
SGML parser to know if you need a full SGML parser.

On the other hand, if you're dealing with a set of documents whose complete
geneology you know, then you can work with something smaller.  But SX works
and it's reasonably fast, so unless you're in an embedded processor
environment or something, I don't really see a need to re-invent the wheel.

-Chris

--
Christopher R. Maden, Solutions Architect
Exemplary Technologies
One Embarcadero Center, Ste. 2405
San Francisco, CA 94111



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list