Whence XQL?

Mon Mar 29 20:12:40 BST 1999

At 10:47 AM 3/29/99 +1000, Marcelo Cantos wrote:
>On Thu, Mar 25, 1999 at 10:09:15PM -0500, Jonathan Robie wrote:

>> Cool, you work on SIM? (Does that make you a SIMian?)
>
>Cute!  It might just take off around here. :-)

I haven't been able to come up with a similar nickname for people who work
on XQL...

>I do wonder what proportion of people looking seriously at XQL are
>into text.  We find WITHIN N to be exceedingly useful.  It is also
>interesting to note that we only offer proximity at the word level and
>that this is all clients ever really want.  We do also offer same
>sentence/paragraph queries, but virtually no-one uses them.

One full-text search engine vendor told me that their users did not use
proximity searching. This surprised me, but it was what convinced me that I
might be able to leave proximity out of even full-text extensions to XQL.

Most of what I have done with XML until fairly recently was with structured
documents rather than data, or with documents that also contain what has
classically been considered data. I am now starting to do more with XML for
data. I think that both Microsoft and Joe Lapp of webMethods have worked
more with data than with documents.

>It's an interesting angle, though not one I had considered (not that I
>have considered many angles :-).  I had understood, perhaps
>incorrectly, that the only way to perform word-level boolean queries
>was to treat words abstractly as leaf nodes of the document tree
>rather than clumps of opaque string data.  Under this conception, to
>find "other name", one would say:
>
>  LINE[WORD="other"; WORD="name"]
>
>It could possibly be made legal to abbreviate the above to:
>
>  LINE["other"; "name"]

XQL as-is does not allow this, but I have discussed this as a possible
extension in the section on "Integrating structured and full-text queries",
in http://www.w3.org/TandS/QL/QL98/pp/murata-san.html, a paper written
together with Makoto Murata-san. It makes the above syntax legal.

The other approach, which you have used above, is to pretend that there is
markup identifying the individual words - that's a perfectly valid approach
too.

>Which would be interpreted as, "a Line element which is the parent of
>a leaf node equal to "other" immediately preceding a leaf node equal
>to "name".  Now, support for proximity ("rose*" within 10 words of
>"sweet") would simply be a matter of:
>
>  LINE["rose*" %10 "sweet"]
>
>(The %N syntax is borrowed from our query language.)  Higher level
>proximities could be done like this:
>
>  LINE["name"] %10 LINE["purple"]
>
>The operator simply adopts the level of its operands mismatched
>operands constitute an error.

I would have to think about how to fit that into the XQL grammar. Does it
have advantages over the function-based approach I suggested earlier?

	near("name", "purple", 10)

This fits into the XQL grammar without modification, it's just a matter of
introducing another function.

Jonathan

jonathan at texcel.no
Texcel Research
http://www.texcel.no

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)