"Multiple" Namespaces? (but NOT for HTML)

Walter Underwood wunder at infoseek.com
Fri Oct 29 18:17:58 BST 1999


At 08:14 AM 10/29/99 -0500, Paul wrote:
>On Thu, 28 Oct 1999, Walter Underwood wrote:
>> It may be that markup is not the right hammer for this problem.
>> Our search engine handles multiple DTDs by mapping the elements
>> into common search meta data elements.
>> 
>>    DC:Creator      -> author
>>    GILS:Originator -> author
>>    TEI:docAuthor   -> author
>
>That's relatively easy for a flat model, but what about a deeply 
>hierarchical one? Can you do a search for "address 1" vs. "Street" but 
>only in "Publisher"? Even more sophisticated, can you recognize that 
>"name in publisher" is "publisher name"?

Nope. To do that, you need an XQL-like engine or a repository.
We're aimed at the other 99% of the market.

Also, when I was researching published DTDs, nearly all of them 
qualified the sub-elements or used entirely different names, so 
that context wasn't necessary: <docAuthor>, <bibAuthor>, <byline>, 
whatever. The only tag that was occasionally reused in different 
contexts was <title>. There is a heuristic (hack?) to use the 
first occurance as the title for the results page. A better 
solution than expecting customers to know XPath, then trying
to teach them over the phone.

Our house style is to err on the side of simplicity and ease of 
use, because it almost impossible to remove features, even if they
confuse almost everyone and benefit almost no one.

I actually spent more time making sure that sentences were extracted
properly from things like this (with multiple mappings possible):

   <title>The <hi type="italic">Ghastly</hi> Happenings at 
      <event><trademark>Infoseek</trademark>'s Halloween
      Party</event></title>

I've got nothing against complex searches, but they don't benefit
our users. In the internet search world, people who type two-word
queries are power users. Really.

wunder
--
Walter R. Underwood
wunder at infoseek.com
wunder at best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list