Automating Search Interfaces

Wed Feb 18 07:29:53 GMT 1998

I have a tendency to talk about things yet to happen as if I saw it happen,
so I must first beg the reader to understand that what follows are just an
opinion of a man.

>>As particular industries and special interests standardize on their
>>respective DTDs, Internet search engines will have to allow users to
>>search by specific elements contained in those documents. In the typical
>>search scenario, a user would use one of the major search services such
>>as AltaVista or Yahoo. Lets say the user wanted to search across real
>>estate listings, and these listings all used the same DTD. It seems that
>>
>>independent search engines need to interpret the DTD for a class of
>>documents and present a query interface based on that DTD. The question
>>is: how is the search engine to interpret the DTD and build an
>>intelligent interface based on that DTD? Simply listing every element in
>>
>>the DTD is one approach, but an ugly one. Many DTDs will contain
>>numerous elements which would only clutter and confuse a search
>>interface.

Standardized schemas will not be there for some time.  Effects of XML will
be felt by all major industries in the near future, and while there will be
sincere efforts to standardize DTDs in most of the markets, fiercely
competitive markets like the search service market will be slow in
standardizing schemas.  I expect another round of tag wars waged this time
by Yahoo, Excite, AltaVista, MS, etc.  The result will be different this
time in that everyone will agree to disagree in the end and move on to
building tools to bridge the differences in structures of contents which
would have accumulated beyond the point of standardizing.

Schema-based universal search interface will be dead upon arrival.  While it
is possible to build such clients, search services that use them will lose
everytime to services offering hand-crafted search interfaces designed to be
easy to use, relevantly flexible, and visually appealing.

Improved accuracy of search results, brought on by wide availability of
XML-based contents, will be lost to most users.  Consumers simply do not
care as long as they can find what they want among first 100 items returned
by a search.  Search services are free after all and therefore do not place
high expectations.

What consumers will care mostly about is the 'freshness' of search results.
All of the widely used search services are currently selling stale
information, a lot of it damaged goods.  There is not much demand for
freshness now but the need will rise dramatically along with the growth of
e-commerce.  XML will bring on new search services which broadcasts search
requests to hundreds to thousands of 'datasites' to get the freshest goods.
It will take tools to build datasites and applications to create contents
for the datasites.  It is not hard to guess who will be the major player in
the next generation of search services.

What I see happening is proliferation of custom DTDs designed around the
contents.  Amazon will not want to throw out some information just so they
can use some standard DTD.  It is like saying that they will chop your arms
off just so they can use the standard-size coffin.  Amazon will use a custom
DTD designed to hold all of their valuable contents including book reviews.
They will offer some, and definitely not all, layers of the contents to
search services by dynamically mapping its DTD to the search service's DTD.
In another word, DTD used to store content will not necessarily be same as
DTD used to transfer.

It is sad to think so but we will also see more and more contents moving
behind protection.  XML makes 'data-spies', 'data-pirates', and
'data-chop-shops' possible.  You will see 'hot-data' detective robots
roaming the net to see if any piece of a site's data is based on its
clients' data based on some intentional mangling of words and images with
hidden signatures.

I hope I did not upset everyone with my 'it sure is obvious to me' attitude.
My sole intention is to help the XML community.  If I make some money along
the way, I can live with it.  I think <g>.

Sincerely,

Don Park
http://www.quake.net/~donpark/index.html

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)