XML Search Engine
fernando at pix.com.br
Thu Nov 5 20:16:39 GMT 1998
Borden, Jonathan wrote:
> For example, suppose I am searching for big apples:
> "This is a little green apple. Big deal."
> will "Big near apple" match?
> how about "Big applied to apple"
This will not be a poblem with any "decent" text retrieval engine because:
a) proximity search can be performed either "ordered" or "non-ordered". This is
quite powerful because it allows you to search for "big near potato" in the
"This is a small potato, big brother"
either to find both "potato, big" as well as "big, potato" or only one
of the two.
Some search engine, like Stairs (the grandfather of all text-retrieval
and BRS have two operator like "near" (or "prox") and "ADJacent", the
first one being unordered, the second one being ordered.
b) Usually search engine know what phrases and paragraphs are. I don't think
proximity should go beyond a period or any other punctuation that ends
a sentence. If you want to search in larger units, like a paragraph, then
you could always define something like "apple SAME PARAGRAPH big"
or "apple SAME SENTENCE big", both of with extend the idea
of "nearness" providing a more logical view of the terms.
c) finally, growing from the very close vicinity (near/adjacent) to a little
further (same sentence/same paragraph) you can go to the whole
"universe" with AND, OR, XOR, etc. What this means is that
you can have a very good control not only on which words you
want, but also where they, how far apart they can be, which one
d) XML allows you to use all the above operators adding a very
useful feature: tag-qualification.
Fernando Cabral Padrao iX Sistemas Abertos
mailto:fernando at pix.com.br http://www.pix.com.br
mailto:Pix at Pix.com.br
Fone: +55 61 321-2433 Fax: +55 61 225-3082
15º 45' 04.9" S 47º 49' 58.6" W
19º 37' 57.0" S 45º 17' 13.6" W
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev