LISTADMIN: List stats, and archival

David Megginson ak117 at
Mon Apr 20 15:04:21 BST 1998

Rzepa, Henry writes:

 > It does seem worthwhile to try to preserve in some form the history
 > of a list such as XML-DEV.Too often, such lists seem to evaporate
 > forever. I propose therefore to transfer the archive of this list
 > to a published CD ROM.  Lack of resources will prevent anything
 > other than the lightest of editing(removing duplicate footers,
 > etc), unless someone offers to do so with any intelligent parsing
 > tools they may have.

How about using a trivial Perl script to convert all of the messages
to a simple XML document type (assuming nothing about the semantics of
the message body itself)?  You could try something like the following:

  <?xml version="1.0"?>

  <!DOCTYPE message SYSTEM "message.dtd">


  <name>David Megginson</name>
  <e-mail>ak117 at</e-mail>
  <date>February 18, 1998</date>
  <subject>re: Some Subject</subject>
   <reference xml:link="simple" href="aaa.xml"/>
   <reference xml:link="simple" href="bbb.xml"/>
   <reference xml:link="simple" href="ccc.xml"/>

  <body xml:space="preserve">
  This is whatever appeared in the body of the message, only with
  XML characters like &lt;, &gt;, and &amp; escaped, form feeds
  stripped out, and everything above 0x8f converted to a character


  David Megginson                 ak117 at
  Microstar Software Ltd.         dmeggins at


This would make it simpler to use the archive with an XML search
engine later on, and would provide a nice (and very large) base of
sample XML documents.  Here's the external DTD subset for the example

  <!ENTITY amp "&#x26;">
  <!ENTITY lt "&#x3c;">
  <!ENTITY gt "&#x3e;">
  <!ELEMENT message (header, body)>
  <!ELEMENT header (poster, date, subject, reference-list?)>
  <!ELEMENT poster (name?, e-mail)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT e-mail (#PCDATA)>
  <!ELEMENT date (#PCDATA)>
  <!ELEMENT subject (#PCDATA)>
  <!ELEMENT reference-list (reference+)>
  <!ELEMENT reference EMPTY>
  <!ATTLIST reference
    xml:link NMTOKEN #FIXED "simple"
  <!ELEMENT body (#PCDATA)>
  <!ATTLIST body
    xml:space (default|preserve) #FIXED "preserve">

All the best,


David Megginson                 ak117 at
Microstar Software Ltd.         dmeggins at

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list