Response to CSIR and dialog.

David E. Bernholdt bernhold at npac.syr.edu
Sat Feb 22 17:10:28 GMT 1997


Let me say up front that I very much appreciate this discussion.
You raise many points which I have considered while developing CISR.
Although I (obviously) think it was worthwhile to develop and am happy
to tell you _my_ ideas, CSIR is very much an experiment at its
beginning and beginning with the "rolling out" the service, we'll
start to see where theory and practice meet.


Let me first say something about the funding for this project.  CSIR,
as it exists today, is union of ideas that Geoffrey Fox (Director of
NPAC) and I had last spring.  The ideas came from looking at several
_existing_ efforts within NPAC and realizing that they could be useful
to the chemistry community.  The AskNPAC Mailing List Archive is based
on work done here by a graduate student in the context of computer
science and education objectives.  We have simply deployed that
software system in a new context.  The software repository is based on
the National HPCC Software Exchange (NHSE, HPCC = High-Performance
Computing and Communications) project, of which NPAC is a member.
Once again, we have adapted existing work to our needs.  Our NHSE
grant paid a small amount for someone to create the catalog entries we
have now, as a "seed", and I have donated my own time to supervise the
project.  As far as hardware resources go, this project is small
enough that we can operate it without impacting our ability to carry
out our other (funded) work.

So like many projects including CCL, CSIR has started out with
essentially no budget.  Like CCL and others, we can operate this way
for a while.  If the service becomes popular, it will probably become
increasingly costly to run, and we'll eventually be forced to seek
independent support for it, as Jan has with CCL.  Jan can tell you how
hard it was to get NSF funding for his very popular and well
established service, and I can tell you that for a brand-new service
like CSIR, funding agencies are just not interested.  It seems to me
that any such operation must start on a shoe-string budget (or perhaps
take the commercial route and look for venture capital), and must always
make best use of what is already out there.  And it is my opinion that
you do not need to be terribly concerned that the funding agencies are
spending too much money on the development of information services.

Now on to other points...

> do we really need a centralised repository for lists?  I have not yet had
> a problem that could not be solved by searching with HOTBOT, AltaVista and 
> DejaNews, *and* the people that respond to the CCL.  

Having gone to the effort to create it, of course I do believe the
mailing list archive will be valuable, but I am willing to accept that
it might turn out not to be in the end -- we'll have to see.

If you are fortunate enough to be in an area that CCL alone (among
mailing lists) provides the information you need, that is good, but I
think there are many in the chemistry community for whom this is not
the case.  And even questions addressed to CCL might better be
addressed to other lists -- might even be answered elsewhere already.
To my thinking, the centralized repository, if it really catches on,
should help reduce the signal-to-noise ratio and the redundant or
off-topic questions in mailing lists -- the lists become more
efficient and people are able to find more complete information more
quickly than is presently possible.

> Is the real problem that listserv archives are not available for
> indexing (ignoring for the moment the lack of archives you point out)?   

I doubt if you will ever see the major players in the search engine
arena adding mailing lists to their database.  I can tell you from
experience that it it too tedious to manage on more than a limited
topical scale.  And even for Usenet, which I would say is of limited
value to the chemistry community (far more mailing lists than Usenet
groups), the volume is so much that the major search engines only keep
a limited amount on line.

If you want extensive coverage in a limited area, like chemistry,
especially where there are a lot of interesting mailing lists, I think
the only possibility is specialty projects like CSIR.

> It seems that distributed computing (and therefore storage, indexing and 
> distribution) are greatly enhanced by these newish search engines.  Can
> you expect to be covering the sort of information you are talking about 
> *better* than the search engines?  Or will I have to search through CSIR
> and then the search engines?

If we had a chemistry-focused search engine, we could probably pretty
easily integrate it with the mailing list archive so that you could
seamlessly search the whole thing in one shot.  We are in fact
thinking about just such a search engine, but it will be some time in
coming if we decide to go ahead.  The key question is can we provide
more value in a chemistry-focused search engine than the general
search engines can.  Of course we won't undertake the effort unless we
think we can.

So for the time being, yes, you'd probably want to search CSIR in
addition to your favorite search engines.  Knowing how different each
existing search engine is, you'd probably want to use more than one
even if we (or someone else) could offer you a chemistry-focused
engine.  So perhaps you'll always be using a handful of search engines
to find the information you need.  What CSIR's Mailing List Archive
offers you is the chance to add ONE search to your handful and get
results from all of the eighty or one hundred lists in the archive, as
opposed to having to search each one individually, or (more likely)
ignoring the information because its too hard to access.  To my
thinking, it can only help.

--
David E. Bernholdt                      | Email:  bernhold at npac.syr.edu
Northeast Parallel Architectures Center | Phone:  +1 315 443 3857
111 College Place, Syracuse University  | Fax:    +1 315 443 1973
Syracuse, NY 13244-4100                 | URL:    http://www.npac.syr.edu
-----
chemweb: A list for Chemical Applications of the Internet.
Archived as: http://www.ch.ic.ac.uk/hypermail/chemweb/
To unsubscribe, send to listserver at ic.ac.uk the following message;
unsubscribe chemweb
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)



More information about the chemweb mailing list