Whence XQL?

Mon Mar 29 02:47:39 BST 1999

On Thu, Mar 25, 1999 at 10:09:15PM -0500, Jonathan Robie wrote:
> At 01:31 PM 3/26/99 +1100, Marcelo Cantos wrote:
>  
> >I could be disingenuous ( :-) ) and suggest that the attachment to
> >Microsoft has more than a little to do with its success to date, but I
> >certainly don't want to disparage the effort in its own right.  It
> >offers a good compromise between expressivity and simplicity, which is
> >a far more practicable goal than completeness.
> 
> Well, Microsoft was one of the first companies I got interested in XQL ;->
> 
> >I am concerned (am I right on this?) at the lack of proximity
> >operators.  But that's just an implementor's perspective, looking at
> >doing things we already support.
> 
> Cool, you work on SIM? (Does that make you a SIMian?)

Cute!  It might just take off around here. :-)

> I really enjoyed
> talking to Timothy Arnold-Moore at Markup Technologies '98 - Makoto
> Murata-san and I managed to snag him after his presentation and grill him
> with questions for a while.
> 
> I've gone back and forth on proximity operators. Several people who have
> implemented full-text search systems have told me that users don't really
> use proximity operators, that they are useful in the implementation, but
> need not be exposed to the user. Others vehemently disagree. I took the
> pragmatic approach of leaving it out to see who would complain. Frankly,
> you are the first to do so.

I do wonder what proportion of people looking seriously at XQL are
into text.  We find WITHIN N to be exceedingly useful.  It is also
interesting to note that we only offer proximity at the word level and
that this is all clients ever really want.  We do also offer same
sentence/paragraph queries, but virtually no-one uses them.

> I have discussed proximity searching as a possibility in the
> following paper:
> 
> http://www.w3.org/TandS/QL/QL98/pp/murata-san.html
> 
> Here's an excerpt:
> 
> <excerpt>
> 
> In addition, functions for proximity searching might be useful. The
> following returns <LINE> elements in which "rose*" and "sweet*"
> occur within 10 words of each other:
> 
> LINE[near("rose*", "sweet", 10)] This would match lines like these:
> 
> <LINE>A rose by any other name would smell as sweet.</LINE>
> <LINE>Sweet roses grew along the south side of the fence.</LINE>
> <LINE>She rose and smiled sweetly at the purple dwarf under the
> bucket.</LINE> <LINE>Say, has anybody seen my Sweet Gypsy
> Rose?</LINE>
> 
> Proximity searching requires some way to indicate how close the
> strings must be in order to match. This causes a difficulty when
> choosing the units in which proximity is measured. In existing
> full-text systems, distance is frequently measured in terms of
> words, which raises a number of significant questions regarding
> internationalization, but is probably an intuitive way to measure
> distance for most users.
> 
> </excerpt>
> 
> I'm not sure whether this is the best approach or not. Do you like
> this approach? If not, what approach would you prefer?

It's an interesting angle, though not one I had considered (not that I
have considered many angles :-).  I had understood, perhaps
incorrectly, that the only way to perform word-level boolean queries
was to treat words abstractly as leaf nodes of the document tree
rather than clumps of opaque string data.  Under this conception, to
find "other name", one would say:

  LINE[WORD="other"; WORD="name"]

It could possibly be made legal to abbreviate the above to:

  LINE["other"; "name"]

Which would be interpreted as, "a Line element which is the parent of
a leaf node equal to "other" immediately preceding a leaf node equal
to "name".  Now, support for proximity ("rose*" within 10 words of
"sweet") would simply be a matter of:

  LINE["rose*" %10 "sweet"]

(The %N syntax is borrowed from our query language.)  Higher level
proximities could be done like this:

  LINE["name"] %10 LINE["purple"]

The operator simply adopts the level of its operands mismatched
operands constitute an error.

Caveat: I confess that I don't know XQL very well at all, so I may be
saying something completely different to what I intended with the
above examples.  Corrections are most welcome.

Cheers,
Marcelo

-- 
http://www.simdb.com/~marcelo/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)