A processing instruction for robots

Sat Dec 11 14:14:14 GMT 1999

* Lars Marius Garshol
| 
| First thought: this is fine for very simple uses, but for more
| complex uses something along the lines of the robots.txt file would
| be very nice. How about a variant PI that can point to a robots.rdf
| resource?

* Walter Underwood
| 
| In our experience, the simple form covers almost all needs.  We have
| 1000+ customers, and only three or four of them use our selective
| indexing support. So, I think of the robots meta tag as a proven
| solution that doesn't need major improvement.

I agree with you that probably the majority of web authoring
individuals prefer and are happy with the "meta tag" solution,
however, lots of people (such as me, for example) are not going to be
happy with it, since it requires indexing information to be added to
each and every document. My gut reaction to that is that it's plain
wrong, because it leads to so much hassle in content maintenance.

Also, using an RDF file to describe the site structure opens up for
new possibilities such as being able to group resources in a sensible
way to enable search engines to respond with more meaningful search
results.

Ever since RDF appeared I've been waiting for some application that
would enable me to say:

  - all these resources are small pieces of this larger split-up
    resource, which is represented by _this_ resource

  - this group of resources belongs together, and they are represented
    by _this_ resource

  - this group contains this other group

  - this are the groups of resources that make up this site, and this
    is the home page of the site

  - these groups are authored by this person, who is represented by
    this resource

  - this resource is of this kind

In an ideal world, this would lead to search engine responses like the
following:

  http://www.infotek.no/foredrag/lmg-xml.no-99/slide34.html

  xml-dev, part of a slide presentation by Lars Marius Garshol.
  Part of the STEP Infotek web pages.
  [top slide] [site top page] [author]

This doesn't really seem all that hard, but optimist that I am I may
of course be seriously underestimating the difficulties involved.

| Secondly, fetching two or more entities for one document makes the
| robot code much more complex. If the robots.rdf file gets a 404,
| what happens? What about a 401 or a timeout? The robot may need
| separate last-modified dates and revisit times for each entity. And
| after it is implemented and tested, how do you explain all that to
| customers who just want search results?

Personally, if I were a search engine vendor, I would see this is a
great chance to really stand out from the competition and deliver
something beyond what the others do, at least until they catch on.

Yes, it requires more from the users, yes, it requires more from the
implementation, but this has to be weighed against the benefits, which
are presumably large. 

Also, seeing the amount of interest for "meta tags" and optimizing for
various search engines among various content providers I assume that
if this facility really did help providers get more hits for their
sites then that would be all the motivation they need.

But in any case this was only meant as a loose suggestion, so if
you're not interested, then that's the end of that.

--Lars M.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)