Automating Search Interfaces

Michael Kay M.H.Kay at
Wed Feb 18 14:17:18 GMT 1998

>>This is a question about how the search scenario will play out on the
>>web once XML becomes widely implemented

Some suggestions & predictions:

1. The "whole web" search services are not keeping pace with the growth of
the web; they are having to index more selectively and less often. There is
therefore increasing room for more specialised search services. There will
certainly be some that concentrate on a particular domain (say sports
results) and that get to understand the DTDs that are widespread in that
domain. This may in turn act as an incentive to the standardisation of
domain DTDs.

2. Search engines will probably start applying heuristics to the XML
even if they don't know the semantics of the DTD. This comes naturally to
software trying to extract information from raw text. For example, tags with
recognised names such as <TITLE> may raise the weighting of the text
contained therein; tags that contain small amounts of text may be ranked
more highly than tags containing most of the document.

3. Some conventional tags such as <META> may emerge and be used in a wide
range of DTDs if the search engines are known to apply special heuristics to
them. Other conventional tags, e.g. for personal names or places, may also

4. The general public is only interested in doing simple searches. In more
specialist communities, query languages that allow the tagging to be
exploited will become available. Many search engines already have languages
that support "field-sensitive" searching and I think these can largely be
applied to XML without extension. Such queries only make sense within the
context of a
single DTD or a family of closely-related DTDs. The "navigational" query
languages such as the XLL syntax or DSQL are too precise and too complex for
free text searching.

5. XML may start to become a vehicle for a site to publish an abstract of
itself. Search services, rather than indexing all the content of a site
(which is becoming unviable) will start to index the published abstracts of
sites, and having directed the enquirer towards a site, will then delegate
the within-site searching to a search engine at the site itself.

By the way, does anyone know of a search engine (I mean software, not a web
service) that understands XML? I have been looking at writing an IFilter
interface for Microsoft's Index Server and it's rather daunting, especially
as MS will presumably produce one themselves within a year.


Mike Kay, ICL

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list