Who needs XHTML Namespace?

Wed Sep 1 19:54:49 BST 1999

At 07:31 AM 9/1/99 -0400, Paul Prescod wrote:
>David Megginson wrote:
>> 
>> Paul Prescod writes:
>> 
>>  > What is the virtue in discovering XHTML data in an arbitrary
>>  > document if there are *no rules* about what that information will
>>  > look like? Are you really going to write processors that do not
>>  > care whether images occur within titles or tables within images?
>> 
>> Sure -- a search engine is a very good example of one.
>
>Really? Search engines don't care whether <title>s have images in them?
>Or whether <h1>'s have <table>'s in them? I'm sure that there are some
>that don't but I'm equally sure that there are some that do.

Ours doesn't. It recognizes some tags as a place to break sentences
for natural language processing, and it looks for the first undecorated
text in the document to use as a summary. It also saves text from
inside an <a> tag to index with the referenced document (no, Google
didn't do it first).

But it doesn't care whether <title> has an image, or which kind of
sentence-breaking tag is used (<p>, <blockquote>, <td>, ...).

Hmm, the "strict" variant makes looking for undecorated text
more difficult. I doubt that we'll interpret a stylesheets in 
order to index text. So anbody who wants to use "strict" had
better be ready to put in "description" meta tags.

wunder

--
Walter R. Underwood
wunder at infoseek.com
wunder at best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)