Simple approaches to XML implementation
Peter Murray-Rust
Peter at ursus.demon.co.uk
Sat Mar 1 11:16:17 GMT 1997
[from PeterMR]
>
> Thanks Ingo,
> This is very useful, because it shows that a great deal can be done quite
> simply.
>
> In message <199703010216.DAA00533 at florix.rz.tu-clausthal.de> Ingo Macherius writes:
> [...]
> > I have made up a perl5 module which models a very simple forest-like strukture,
> > that holds Perl5 objects. The objects are created by reading nsgmls' ESIS
>
> I believe that ESIS has potentially a useful role in producing XML documents
> from SGML documents - this was certainly my own strategy until recently.
> ESIS is the normalised output from a parser (especially sgmls or NSGMLS from
> James Clark - these are freely available.) It's trivial to transform
> ESIS into XML, but not the other way round, since XML is richer.
>
> ESIS doesn't retain everything from the original document(s) and I've been
> asking the experts what gets lost. My rough summary is that XML->ESIS
> loses:
> - comments (this matters if you want to edit the document or have
> it read by humans. However comments should not be used
> by machines - simply passed through)
> - entities. If your document includes entities such as &chapter1;
> these may be expanded and replaced by their contents. In
> this way some of the structure may be less clear
> - conditional markup. If you use INCLUDE and/or IGNORE then the
> IGNORE'd sections won't come through and the INCLUDE'd
> ones won't be marked as such
> [I think that processing instructions come through OK? And that you can
> determine whether an attribute value was defaulted or not?]
>
> If you use this simple level of markup (and _I_ do for molecular science)
> then XML WF documents are equivalent to ESIS output from sgmls or nsgmls.
> [Query: Are there plans for nsgmls/sgmls to output XML as an alternative
> to ESIS? I expect it's straightforward].
>
>
> > and putting anything between certain named tags into a hash, which
> > basically is the object content. The objects can be inserted as a root or into
> > another object, which yields a forest-like structure.
> > The tree-relations between objects are stored outside in a libdbm database,
> > one per tree. It holds three tables,
> > - id -> hashed data
> > - id -> id of father object, or NULL
> > - id -> ids of all sons
> > Obviously any object must have a method giving a unique id within the forest.
> > I think this may be called a poor-mans-grove :) I made up a simple API:
> ^^^^^^^^^^^^^^^
> It's still very powerful, and you have recognised the importance of
> structured documents. The good news is that this will all be addressed
> (literally and metaphorically) in the discussion of addressing within
> XML documents. The TEI project has developed a pointer scheme which
> covers most aspects of structure and extends the metaphor to descendants,
> ancestors, siblings and navigation by attributes and their values. I
> am expecting one or more 'black boxes' to be developed which support this,
> so that you don't have to write perl scripts any more. I'm waiting to hear
> from another thread :-)
>
> [... code deleted ...]
> >
> > I found this sufficient to solve small problems for which ESIS is not enough
> ^^^^^^^^^^
> I think you were operating _on_ the ESIS stream. You mean that simple
> 'grep' or other tools weren't powerful enough?
>
> > and a grove is overkill. I must admit, albeit I read most of ISO 10179, I
> ^^^^^^^^^^^^^^^^^
> This is one of the points at issue. Is it going to be possible to produce
> software quickly, and easy enough to read and use. I'm waiting to find out:-)
>
> > really didn`t get the details. But what I found valuable is the choice
> ^^^^^^^^^^^
> I think it's very important not to be frightened by 10179. What you have
> done is very similar to what I and many others have done - devising
> home-grown tools for searching structured documents. 10179 has an
> implementation in Scheme (am I right?) but not in more procedural or
> object-oriented languages.
>
> > between navigating (father/son) and id-based lookups (fetch).
>
> [...]
> P.
>
--
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list