"Multiple" Namespaces? (but NOT for HTML)
Walter Underwood
wunder at infoseek.com
Fri Oct 29 18:17:58 BST 1999
At 08:14 AM 10/29/99 -0500, Paul wrote:
>On Thu, 28 Oct 1999, Walter Underwood wrote:
>> It may be that markup is not the right hammer for this problem.
>> Our search engine handles multiple DTDs by mapping the elements
>> into common search meta data elements.
>>
>> DC:Creator -> author
>> GILS:Originator -> author
>> TEI:docAuthor -> author
>
>That's relatively easy for a flat model, but what about a deeply
>hierarchical one? Can you do a search for "address 1" vs. "Street" but
>only in "Publisher"? Even more sophisticated, can you recognize that
>"name in publisher" is "publisher name"?
Nope. To do that, you need an XQL-like engine or a repository.
We're aimed at the other 99% of the market.
Also, when I was researching published DTDs, nearly all of them
qualified the sub-elements or used entirely different names, so
that context wasn't necessary: <docAuthor>, <bibAuthor>, <byline>,
whatever. The only tag that was occasionally reused in different
contexts was <title>. There is a heuristic (hack?) to use the
first occurance as the title for the results page. A better
solution than expecting customers to know XPath, then trying
to teach them over the phone.
Our house style is to err on the side of simplicity and ease of
use, because it almost impossible to remove features, even if they
confuse almost everyone and benefit almost no one.
I actually spent more time making sure that sentences were extracted
properly from things like this (with multiple mappings possible):
<title>The <hi type="italic">Ghastly</hi> Happenings at
<event><trademark>Infoseek</trademark>'s Halloween
Party</event></title>
I've got nothing against complex searches, but they don't benefit
our users. In the internet search world, people who type two-word
queries are power users. Really.
wunder
--
Walter R. Underwood
wunder at infoseek.com
wunder at best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list