Who needs XHTML Namespace?

Walter Underwood wunder at infoseek.com
Wed Sep 1 19:54:49 BST 1999

At 07:31 AM 9/1/99 -0400, Paul Prescod wrote:
>David Megginson wrote:
>> Paul Prescod writes:
>>  > What is the virtue in discovering XHTML data in an arbitrary
>>  > document if there are *no rules* about what that information will
>>  > look like? Are you really going to write processors that do not
>>  > care whether images occur within titles or tables within images?
>> Sure -- a search engine is a very good example of one.
>Really? Search engines don't care whether <title>s have images in them?
>Or whether <h1>'s have <table>'s in them? I'm sure that there are some
>that don't but I'm equally sure that there are some that do.

Ours doesn't. It recognizes some tags as a place to break sentences
for natural language processing, and it looks for the first undecorated
text in the document to use as a summary. It also saves text from
inside an <a> tag to index with the referenced document (no, Google
didn't do it first).

But it doesn't care whether <title> has an image, or which kind of
sentence-breaking tag is used (<p>, <blockquote>, <td>, ...).

Hmm, the "strict" variant makes looking for undecorated text
more difficult. I doubt that we'll interpret a stylesheets in 
order to index text. So anbody who wants to use "strict" had
better be ready to put in "description" meta tags.


Walter R. Underwood
wunder at infoseek.com
wunder at best.com (home)
http://software.infoseek.com/cce/ (my product)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list