Text in XML
David Megginson
david at megginson.com
Mon Sep 14 16:01:49 BST 1998
Fernando Cabral writes:
> In order to test some characteristics of a SGML-based search
> engine, I need some XML files. I would prefer having some classics
> of the literature, preferentially those including attributes like
> emphasis, bold, italics and diacritics.
Please don't take this the wrong way, but I'm hoping that this search
will fail (at least, the part about "bold", "italics", etc.). There
are special circumstances where people would mark up presentational
information like typefaces in XML (codicology and library science are
two obvious examples), but for general-purpose use, an XML literary
text would say what something *is* rather than what it should *look
like*. For example,
BAD (usually):
<newline>
"What a <italic>beau</italic>!" signed Cecille.
GOOD (usually):
<p><q>What a <foreign>beau</emphatic>!</q> sighed Cecille.</p>
A literary or linguistic scholar might add all sorts of extra
information:
<para><q ref="Ce0020"><s type="excl">What a <foreign
source="FR" period="s.xix" usage="m-class
u-class">beau</emphatic>!</s></q> sighed <name
ref="Ce0020">Cecille</cecille>.</para>
Sure, it looks like hell, but the scholar can use this to generate an
index of proper names (usefull for a 2,000-page Victorian novel) and
index of foreign terms, and can execute queries like
How often does Cecille use French words in an exclamatory sentence?
Don't try this at home.
All the best,
David
--
David Megginson david at megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list