Re: Central vs. Distributed Archives

From: Stevan Harnad <>
Date: Thu, 9 Nov 2000 11:16:11 +0000

On Wed, 8 Nov 2000, Greg Kuperberg wrote:

> While libraries certainly should help preserve e-prints, I do not trust
> any one library, nor any other sole institution, to archive material
> single-handedly. Any caretaker can lose or destroy a unique copy of
> any document... That is why it is important to redundantly and
> openly mirror an archive and not just allow third-party searches. The
> arXiv has 18 mirror sites on six continents

Who is disagreeing with this? All requisite redundancy is just as
desirable, and feasible, and inevitable, with institution-based
distributed archiving as with discipline-based archiving.

I think there is an incorrect analogy at the heart of Greg's frequent
use of the term "fragmented" in speaking about the institution-based
approach to self-archiving:

I think Greg continues to equate (1) archiving with publishing, and
(2) institutional digital "collections" with localized "books-on-shelves"
(ripe for a Library-of-Alexandria catastrophe; hence his example of the
lost/destroyed "unique document"). And (3) (unrefereed, unpublished)
PREprints continue to be treated as the "paradigm" for it all, whereas
it is much more informative and representative to see it in terms of
(refereed, published) POSTprints: We are, after all, aiming at freeing
the REFEREED literature -- with the prepublication embryological stages
merely an added bonus, rather than the focus of it all.

So, to summarize: Whilst, our refereed papers are already, as they are,
safely in the hands of journals and libraries, blissfully mirrored
(though unblissfully unfree), we need not fret about Alexandria.
Freeing a postprint (sic) via self-archiving (whether central or
institutional, interoperable or not) is a bonus, a plus, a freebie, a way
to make it accessible to those multitudes worldwide who cannot access
it because of the S/L/P firewalls surrounding the safe, Alexandria

It is inviting Zeno's Paralysis (again) to say: "Keep waiting till you
have an Alexandria-proof centralized, mirrored, redundant arXiv-style
Archive to self-archive them in before you dare to self-archive your
(already safely mirrored) postprints."

Nay! Release them from their hostagehood behind obsolete,
impact-blocking, and completely surmountable access barriers online
today through self-archiving, addict fellow-researchers the world over
to that new, free form of access to it all, and the redundancies and
mirrors will come tomorrow, in plenty of time to keep the freed corpus
aloft in the skies. (And nothing is at risk: the firewalled version
remains as safe -- from catastrophic loss as well as illicit access --
as it ever was.)

If that is now transparent for postprints, it should be equally
transparent that the same applies to preprints: They are destined to
become postprints (hence secure, for the above reasons) anyway. Being
available online early is a bonus; a freebie. Moreover, it is bonus
that has no prior history of enjoying the safe/secure status of
postprints anyway: access to preprints was always restricted and
evanescent, destined to be superseded by the secure postprint once it
was available.

Now the redundancy and mirroring that will be accorded the freed
postprint corpus, once it is freed, will also be inherited by the
preprint corpus.

So there is nothing to lose, and everything to be gained, by
self-archiving all preprints and postprints now, in either the
centralized OAI-compliant ( archives like
arXiv (, or in institutional OAI-compliant archives,
like Eprints (

Ignore Cassandras: Preservation problems are eminently soluble, once
the goods are up there: the real problem now is how to get researchers
to put them up there, at long last. Central archives have gone part of
the distance but are proving too slow. Institutional archives are natural
allies in hastening us on the road to the optimal and inevitable.

> As a rule, it is better for web sites to share the same archive than
> to each have fragments. It is better for Oxford and Cambridge to
> each have all of Shakespeare's plays than for Oxford to have only the
> comedies and Cambridge to have only the tragedies. That is why I favor
> shared interoperability, which is in some ways centralized, to fragmented
> interoperability, which is optimistically called decentralized. Massive
> redundancy is one of the few strengths of the existing paper-based system;

I am not an expert on digital storage, coding or preservation, but I am
not at all sure that Greg is technically right above (and I'm certain
that the Oxford/Cambridge hard-copy analogy is fallacious). I would
like to hear from specialists in localized vs. distributed digital
coding, redundancy, etc. -- bearing in mind that in the case of the
refereed literature, this is all moot anyway, because free access now,
is infinitely preferable to no access, no matter how short-lived it
risks being. The "locus classicus" is still safely ensconced behind the
toll gates.

