Re: Central vs. Distributed Archives

From: Stevan Harnad <>
Date: Mon, 6 Nov 2000 17:46:57 +0000

On Fri, 3 Nov 2000, Greg Kuperberg wrote:

> It is not really a neutral statement to declare that it no longer
> matters whether a paper is in a central archive or a distributed one.
> Each archive is in a way an entrenched interest. Each archive maintainer
> has put a lot of work into his or her project, and therefore wouldn't
> want it assimilated into a larger archive without a very good reason.

I am afraid I cannot follow this at all. Are you saying that the
"maintainer" of a free public archive of refereed research has an
interest in NOT having that research "assimilated" into still larger
public archives, if it increases their visibility, accessibility and

(If there really do exist such "entrenched" archive-maintainer
interests, they begin to resemble the conflict of interest that has
emerged between researchers and journal publishers, when it comes to
access-barriers to their work!)

The maintainers I have in mind are those whose interest is in freeing
this research from needless access/impact barriers, not in adding to

In particular, neither universities who provide distributed
institutional Eprint Archives for self-archiving the refereed research
of their researchers, nor Learned Societies who do so for the sake of
their disciplines, in a centralized archive, have anything to gain from
preventing their respective archive contents from being harvested by
Open Archive Services into still larger "virtual" archives, all
seamlessly interoperable (e.g.,

As to justifying access-barriers on the grounds that the archive
maintainer "has put a lot of work into his or her project," the Eprints
software should now make that work so minimal that this dubious
rationale becomes moot anyway:

> This is overconfidence. The biggest reason that it is overconfidence
> is that it defers the permanence question. But there are other reasons
> as well. One is that one of the most useful features of the arXiv
> (and similar services such as CogPrints) is immediate notification of
> new results.

There is no (not-readily-solvable) "permanence question." At this
point, getting the literature on-line and free is the most important
thing to do, now. The collective interests that this will generate in
KEEPING it all on-line and free will ensure that all proper steps are
taken to ensure permanence.

The OAI-compliant archive-creating/maintaining Eprints software has the
same notification service as CogPrints -- indeed, it is a generic
adaptation of the CogPrints software!

> Another is non-redundancy: the arXiv almost completely
> eliminates the disarray of having many copies of a paper which may
> or may not be different versions. The OAI standard does not address,
> and perhaps cannot address, either of these important advantages of a
> centralized system.

The OAI-standard has not yet addressed version control (it will) but
the OAI-compliant Eprints Software has. Moreover, version-sorting is
a natural function for an Open Archives Service that harvests all
versions of a paper, and sorts them the way you like (date, archive,
use, etc.) Such a service is a natural one to go hand in hand with
citation-linking (which likewise has to sort versions):

> interoperability keeps getting reinvented.

The OAI protocol is steadily being optimized (and the OAI-compliant
Archives with it): Is this a bad thing?

> Precedent suggests that if OAI succeeds, it will fade into a
> transparent layer, and that beyond it people will see incompatability
> at a new level and invent another standard.

This sounds unduly pessimistic (and could be said against any attempt
to create interoperability standards).

> HTTP is already an interoperability standard, originally invented for
> the purpose of distributing research documents.
> And there are already HTTP-based search engines, including CiteSeer,
> which searches only for research papers. So it's important explain how
> OAI would go beyond HTTP+CiteSeer.

I suggest that this question be re-directed to the OAI discussion list,
which is concerned with the technical details:

Stevan Harnad
Professor of Cognitive Science
Department of Electronics and phone: +44 23-80 592-582
             Computer Science fax: +44 23-80 592-865
University of Southampton
Highfield, Southampton

