Simple approaches to XML implementation

Peter Murray-Rust Peter at ursus.demon.co.uk
Sat Mar 1 11:16:17 GMT 1997


[from PeterMR]
> 
> Thanks Ingo,
> This is very useful, because it shows that a great deal can be done quite 
> simply.
> 
> In message <199703010216.DAA00533 at florix.rz.tu-clausthal.de> Ingo Macherius writes:
> [...]
> > I have made up a perl5 module which models a very simple forest-like strukture,
> > that holds Perl5 objects. The objects are created by reading nsgmls' ESIS
> 
> I believe that ESIS has potentially a useful role in producing XML documents
> from SGML documents - this was certainly my own strategy until recently.
> ESIS is the normalised output from a parser (especially sgmls or NSGMLS from
> James Clark - these are freely available.)  It's trivial to transform
> ESIS into XML, but not the other way round, since XML is richer.
> 
> ESIS doesn't retain everything from the original document(s) and I've been
> asking the experts what gets lost.  My rough summary is that XML->ESIS
> loses:
> 	- comments (this matters if you want to edit the document or have
> 		it read by humans.  However comments should not be used
> 		by machines - simply passed through)
> 	- entities.  If your document includes entities such as &chapter1;
> 		these may be expanded and replaced by their contents.  In
> 		this way some of the structure may be less clear
> 	- conditional markup.  If you use INCLUDE and/or IGNORE then the
> 		IGNORE'd sections won't come through and the INCLUDE'd 
> 		ones won't be marked as such
> [I think that processing instructions come through OK?  And that you can
> determine whether an attribute value was defaulted or not?]
> 
> If you use this simple level of markup (and _I_ do for molecular science)
> then XML WF documents are equivalent to ESIS output from sgmls or nsgmls.
> [Query: Are there plans for nsgmls/sgmls to output XML as an alternative
> to ESIS?  I expect it's straightforward].
> 
> 
> > and putting anything between certain named tags into a hash, which
> > basically is the object content. The objects can be inserted as a root or into
> > another object, which yields a forest-like structure.
> > The tree-relations between objects are stored outside in a libdbm database,
> > one per tree. It holds three tables,	
> > - id -> hashed data
> > - id -> id of father object, or NULL
> > - id -> ids of all sons
> > Obviously any object must have a method giving a unique id within the forest.
> > I think this may be called a poor-mans-grove :) I made up a simple API:
>                                ^^^^^^^^^^^^^^^
> It's still very powerful, and you have recognised the importance of
> structured documents.  The good news is that this will all be addressed
> (literally and metaphorically) in the discussion of addressing within
> XML documents.  The TEI project has developed a pointer scheme which
> covers most aspects of structure and extends the metaphor to descendants,
> ancestors, siblings and navigation by attributes and their values.  I
> am expecting one or more 'black boxes' to be developed which support this,
> so that you don't have to write perl scripts any more.  I'm waiting to hear
> from another thread :-)
> 
> [... code deleted ...]
> > 
> > I found this sufficient to solve small problems for which ESIS is not enough
>                                                       ^^^^^^^^^^
> I think you were operating _on_ the ESIS stream.  You mean that simple
> 'grep' or other tools weren't powerful enough?
> 
> > and a grove is overkill. I must admit, albeit I read most of ISO 10179, I
>         ^^^^^^^^^^^^^^^^^
> This is one of the points at issue.  Is it going to be possible to produce
> software quickly, and easy enough to read and use.  I'm waiting to find out:-)
> 
> > really didn`t get the details. But what I found valuable is the choice 
>                     ^^^^^^^^^^^
> I think it's very important not to be frightened by 10179.  What you have 
> done is very similar to what I and many others have done - devising
> home-grown tools for searching structured documents.  10179 has an 
> implementation in Scheme (am I right?) but not in more procedural or 
> object-oriented languages.
> 
> > between navigating (father/son) and id-based lookups (fetch).
> 
> [...]
> 		P.
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list