Assisted Search of XML document collections

Edward C. Zimmermann edz at
Sat May 22 20:32:01 BST 1999

> Hi, all
> There is a modest effort being assembled to look at this prototypical
> problem:
> PROBLEM - multitudes of XML documents. The collection is not necessarily
> static, but if dynamic only incrementally so. The business case that would
> apply is that it makes sense to markup the original documents using XML;
> it also makes sense to search a file which is a description of the
> document collection rather than the whole document collection. The
> derivative file is the "index" - we are not assuming that it itself need
> be XML.
Am I missing something. What is the difference between your "vision" and
that of, for examples, GILS--- which assign a metalevel document, so-called
information locator, to a resource? See

> I am choosing my language carefully as there seems to be an equal mixture
> of enthusiasm and coolness displayed towards an XML document collection
> indexing scheme. The fact of the matter is that so far we have identified
In the ASF for GILS (which also defined a distributed gathering concept)

> a number of problems which are amenable to assisted search. We are not
> particularly concerned, at this point, in breaking any new ground in XML -
> rather, this is a project designed to address a subset of XML "usage"
> problems.
Or are you thinking (trying to understand the problem) of something like a
new take on DC so gathers can create their own synthetic locator records?
Or naming conventions? See

> Although I have announced this project on the perl-xml list, and it will
> concentrate on Perl, with and without XS, there is no reason that Java
> and/or C/C++ viewpoints are not welcome. We are primarily interested in
> exploring issues pertaining to the construction of a file that describes a
> collection of XML documents in a succinct fashion, most likely with a
> moderate to high degree of application specificity - i.e. there may not be
> a lot of defaults that make sense.
Keeping to GILS: (following an ISO 11179
Metamodel) to connect crawlers with compliant search engines.....

> We also wish to supply a useful API that search engine writers can use.
So is the project about designing a common development API for search engines?
Or a way for metasearch engines to interoperate with one-another (such as GILS/ASF)?

Since I appear to be totally confused (and, as often, intrigued) a starting point
might be to, if possible, clarify the objectives and goals (the problem is clear)
to explore common ground. 

<A HREF="whois://">Edward C. Zimmermann</A>
<A HREF="">Basis Systeme netzwerk/Munich</A>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list