LISTADMIN: List stats, and archival
David Megginson
ak117 at freenet.carleton.ca
Mon Apr 20 15:04:21 BST 1998
Rzepa, Henry writes:
> It does seem worthwhile to try to preserve in some form the history
> of a list such as XML-DEV.Too often, such lists seem to evaporate
> forever. I propose therefore to transfer the archive of this list
> to a published CD ROM. Lack of resources will prevent anything
> other than the lightest of editing(removing duplicate footers,
> etc), unless someone offers to do so with any intelligent parsing
> tools they may have.
How about using a trivial Perl script to convert all of the messages
to a simple XML document type (assuming nothing about the semantics of
the message body itself)? You could try something like the following:
<?xml version="1.0"?>
<!DOCTYPE message SYSTEM "message.dtd">
<message>
<header>
<poster>
<name>David Megginson</name>
<e-mail>ak117 at freenet.carleton.ca</e-mail>
</poster>
<date>February 18, 1998</date>
<subject>re: Some Subject</subject>
<reference-list>
<reference xml:link="simple" href="aaa.xml"/>
<reference xml:link="simple" href="bbb.xml"/>
<reference xml:link="simple" href="ccc.xml"/>
</reference-list>
</header>
<body xml:space="preserve">
This is whatever appeared in the body of the message, only with
XML characters like <, >, and & escaped, form feeds
stripped out, and everything above 0x8f converted to a character
reference.
David
--
David Megginson ak117 at freenet.carleton.ca
Microstar Software Ltd. dmeggins at microstar.com
http://home.sprynet.com/sprynet/dmeggins/
</body>
</message>
This would make it simpler to use the archive with an XML search
engine later on, and would provide a nice (and very large) base of
sample XML documents. Here's the external DTD subset for the example
(message.dtd):
<!ENTITY amp "&">
<!ENTITY lt "<">
<!ENTITY gt ">">
<!ELEMENT message (header, body)>
<!ELEMENT header (poster, date, subject, reference-list?)>
<!ELEMENT poster (name?, e-mail)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT e-mail (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT reference-list (reference+)>
<!ELEMENT reference EMPTY>
<!ATTLIST reference
xml:link NMTOKEN #FIXED "simple"
href CDATA #REQUIRED>
<!ELEMENT body (#PCDATA)>
<!ATTLIST body
xml:space (default|preserve) #FIXED "preserve">
All the best,
David
--
David Megginson ak117 at freenet.carleton.ca
Microstar Software Ltd. dmeggins at microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list