Whence XQL?

Fri Mar 26 03:08:01 GMT 1999

At 01:31 PM 3/26/99 +1100, Marcelo Cantos wrote:

>I could be disingenuous ( :-) ) and suggest that the attachment to
>Microsoft has more than a little to do with its success to date, but I
>certainly don't want to disparage the effort in its own right.  It
>offers a good compromise between expressivity and simplicity, which is
>a far more practicable goal than completeness.

Well, Microsoft was one of the first companies I got interested in XQL ;->

>I am concerned (am I right on this?) at the lack of proximity
>operators.  But that's just an implementor's perspective, looking at
>doing things we already support.

Cool, you work on SIM? (Does that make you a SIMian?) I really enjoyed
talking to Timothy Arnold-Moore at Markup Technologies '98 - Makoto
Murata-san and I managed to snag him after his presentation and grill him
with questions for a while.

I've gone back and forth on proximity operators. Several people who have
implemented full-text search systems have told me that users don't really
use proximity operators, that they are useful in the implementation, but
need not be exposed to the user. Others vehemently disagree. I took the
pragmatic approach of leaving it out to see who would complain. Frankly,
you are the first to do so.

I have discussed proximity searching as a possibility in the following paper:

http://www.w3.org/TandS/QL/QL98/pp/murata-san.html

Here's an excerpt:

<excerpt>

In addition, functions for proximity searching might be useful. The
following returns <LINE> elements in which "rose*" and "sweet*" occur
within 10 words of each other:

LINE[near("rose*", "sweet", 10)]
This would match lines like these:

<LINE>A rose by any other name would smell as sweet.</LINE>
<LINE>Sweet roses grew along the south side of the fence.</LINE>
<LINE>She rose and smiled sweetly at the purple dwarf under the bucket.</LINE>
<LINE>Say, has anybody seen my Sweet Gypsy Rose?</LINE>

Proximity searching requires some way to indicate how close the strings
must be in order to match. This causes a difficulty when choosing the units
in which proximity is measured. In existing full-text systems, distance is
frequently measured in terms of words, which raises a number of significant
questions regarding internationalization, but is probably an intuitive way
to measure distance for most users.

</excerpt>

I'm not sure whether this is the best approach or not. Do you like this
approach? If not, what approach would you prefer?

Jonathan

jonathan at texcel.no
Texcel Research
http://www.texcel.no

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)