XML Search Engine

Fri Nov 6 12:15:56 GMT 1998

Tim Bray wrote:

> Such sites will be rather small, due to a little problem in
> the retrieval business, namely nobody has ever made serious
> money at it.  Five years ago, I would have said the leading
> vendors were Fulcrum, Verity, PLS, Open Text, and IDI/Basis.
>

Fulcrum has had its moment of glory. The same can not be saidabout the others.
Nevertheless, you've forgotten a very
important name: Dataware Technologies (http://www.dataware.com).

Dataware grew from 0 to several million dollars in a few years
selling text-retrieval systems for CDs (about $40MB/year). Then
it bought BRS, with more than 2,000 data centers.

BRS is still the leading product in text retrieval on a variety
of platforms. Just to mention libraries alone, there are more
than 200 big, big libraries using BRS.

About two years ago Dataware launched EPMS, now renamed
Dataware II Publisher. This is a version of BRS entirely based
on SGML (it reads from about 300 different formats, converts
and stores as an SGML file, and allows you to do text retrieval
both in the traditional way as well as in a more SGML-like way.

Of course, it can read and index directly SGML, XML and HTML.

> Lesson: there's not much juice in that business. XML might cheer
> things up a bit, you never know.  There are any number of decent
> free search engines you can run with either Apache or NT servers...
>

Talking about money, it is quite clear that IBM made a lotof money selling
STAIRS. Now it is musty but for more
than 20 years it reigned undisputed undisputed in the mainframe
kingdom.

So, I think the right conclusion is that in the low-end line of products
where quality/functionality is disputable and price is very low
(PC DOCs, Verity...) there is no real money. On the other hand,
vendors aiming the high-end market should not complain.

> If you're doing relational search, most relational vendors (Oracle,
> Informix, etc) have some sort of full-text add-on that usually
> works OK.

Own experience is that relational vendors are complete uncapableof providing a
good solution for text retrieval. The products
are usually very poor on the funcionality side and miserable on
the performance side.

In fact, I'd like to hear from any of you that know any SIGNIFICANT
application using any relational database for text-retrieval. By significant
I mean: a) several giga or even terabytes of text; b) several millions of
documents; c) at least a few dozens of concurrent users; d) need of
complex searchs (say 20 or 30 words/parts of words combined
with 4 or 5 different operators); d) response time bellow one second
in a common UNIX or mainframe platform.

If any of you have ever heard about such an application, I am eager
to hear about it.

- fernando

--
Fernando Cabral                         Padrao iX Sistemas Abertos
mailto:fernando at pix.com.br              http://www.pix.com.br
                                        mailto:Pix at Pix.com.br
Fone: +55 61 321-2433                   Fax: +55 61 225-3082
15º 45' 04.9" S                         47º 49' 58.6" W
19º 37' 57.0" S                         45º 17' 13.6" W

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)