LISTADMIN: List stats, and archival

David Megginson ak117 at freenet.carleton.ca
Mon Apr 20 15:04:21 BST 1998


Rzepa, Henry writes:

 > It does seem worthwhile to try to preserve in some form the history
 > of a list such as XML-DEV.Too often, such lists seem to evaporate
 > forever. I propose therefore to transfer the archive of this list
 > to a published CD ROM.  Lack of resources will prevent anything
 > other than the lightest of editing(removing duplicate footers,
 > etc), unless someone offers to do so with any intelligent parsing
 > tools they may have.

How about using a trivial Perl script to convert all of the messages
to a simple XML document type (assuming nothing about the semantics of
the message body itself)?  You could try something like the following:

  <?xml version="1.0"?>

  <!DOCTYPE message SYSTEM "message.dtd">

  <message>

  <header>
  <poster>
  <name>David Megginson</name>
  <e-mail>ak117 at freenet.carleton.ca</e-mail>
  </poster>
  <date>February 18, 1998</date>
  <subject>re: Some Subject</subject>
  <reference-list>
   <reference xml:link="simple" href="aaa.xml"/>
   <reference xml:link="simple" href="bbb.xml"/>
   <reference xml:link="simple" href="ccc.xml"/>
  </reference-list>
  </header>

  <body xml:space="preserve">
  This is whatever appeared in the body of the message, only with
  XML characters like &lt;, &gt;, and &amp; escaped, form feeds
  stripped out, and everything above 0x8f converted to a character
  reference.


  David

  --
  David Megginson                 ak117 at freenet.carleton.ca
  Microstar Software Ltd.         dmeggins at microstar.com
	http://home.sprynet.com/sprynet/dmeggins/
  </body>

  </message>

This would make it simpler to use the archive with an XML search
engine later on, and would provide a nice (and very large) base of
sample XML documents.  Here's the external DTD subset for the example
(message.dtd):

  <!ENTITY amp "&#x26;">
  <!ENTITY lt "&#x3c;">
  <!ENTITY gt "&#x3e;">
  <!ELEMENT message (header, body)>
  <!ELEMENT header (poster, date, subject, reference-list?)>
  <!ELEMENT poster (name?, e-mail)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT e-mail (#PCDATA)>
  <!ELEMENT date (#PCDATA)>
  <!ELEMENT subject (#PCDATA)>
  <!ELEMENT reference-list (reference+)>
  <!ELEMENT reference EMPTY>
  <!ATTLIST reference
    xml:link NMTOKEN #FIXED "simple"
    href CDATA #REQUIRED>
  <!ELEMENT body (#PCDATA)>
  <!ATTLIST body
    xml:space (default|preserve) #FIXED "preserve">



All the best,


David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list