Assisted Search of XML document collections

Edward C. Zimmermann edz at
Sat May 22 22:11:55 BST 1999

> On Sat, 22 May 1999, Edward C. Zimmermann wrote:
> > 
> > Since I appear to be totally confused (and, as often, intrigued) a starting point
> > might be to, if possible, clarify the objectives and goals (the problem is clear)
> > to explore common ground. 
> > 
> That is the stage we are at. I have this gut feeling that we need to
> define what it means to have a search engine operate on let's say 100,00
> documents marked up using XML, and what are the situations where it might
> make more sense to search a file which describes that collection.
100K documents is not a problem. Even on consumer PC hardware a modestly performant 
fulltext engine can handle typical queries on such a small collection in fractions
of a second. The problem is more (beyond quantity) that information resources
(XML, HTML or whatever) are not always static but dynamic. That's, above all, one
of the fundamental flaws in the brute-force spider/crawl approaches followed by
the major "Internet Engines" (beyond the impact on bandwidth, the half-life of
data, and all the other significant shortcommings).

> Your best contribution would be to describe a business problem and tell us
> how you like to solve it.
Different problems, different methods, different tools. 

Lets turn the tables, since I'm the confused soul, can you explain a bussiness
problem and tell us how you might plan to "solve it"....
> Arved

<A HREF="whois://">Edward C. Zimmermann</A>
<A HREF="">Basis Systeme netzwerk/Munich</A>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list