XML Search Engine

Fernando Cabral fernando at pix.com.br
Thu Nov 5 20:16:39 GMT 1998


Borden, Jonathan wrote:

> For example, suppose I am searching for big apples:
>
> "This is a little green apple. Big deal."
>
> will "Big near apple" match?
> how about "Big applied to apple"

This will not be a poblem with any "decent" text retrieval engine because:

a) proximity search can be performed either "ordered" or "non-ordered". This is
    quite powerful because it allows you to search for "big near potato" in the
sentece

        "This is a small potato, big brother"

        either to find both "potato, big"  as well as "big, potato" or only one
of the two.
        Some search engine, like Stairs (the grandfather of all text-retrieval
engines)
        and BRS have two operator like "near" (or "prox") and "ADJacent", the
        first one being unordered, the second one being ordered.

b) Usually search engine know what  phrases and paragraphs are. I don't think
     proximity should go beyond a period or any other punctuation that ends
     a sentence. If you want to search in larger units, like a paragraph, then
     you could always define something like "apple SAME PARAGRAPH big"
     or "apple SAME SENTENCE big", both of with extend the idea
     of "nearness" providing a more logical view of the terms.

c) finally, growing from the very close vicinity (near/adjacent) to a little
    further (same sentence/same paragraph) you can go to the whole
    "universe" with AND, OR, XOR, etc. What this means is that
    you can have a very good control not only on which words you
    want, but also where they, how far apart they can be, which one
    comes first...

d) XML allows you to use all the above operators adding a very
    useful feature: tag-qualification.

- fernando

--
Fernando Cabral                         Padrao iX Sistemas Abertos
mailto:fernando at pix.com.br              http://www.pix.com.br
                                        mailto:Pix at Pix.com.br
Fone: +55 61 321-2433                   Fax: +55 61 225-3082
15º 45' 04.9" S                         47º 49' 58.6" W
19º 37' 57.0" S                         45º 17' 13.6" W



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list