The author/institution self-archiving initiative



Stevan Harnad
Intelligence/Agents/Multimedia Group
Department of Electronics and Computer Science
University of Southampton
Highfield, Southampton SO17 1BJ UNITED KINGDOM

Unlike the authors of books and magazine articles, who write their texts for royalty or fee income, the authors of refereed journal articles write them only for "research impact", which means for their effects on research and researchers. In order to reach researchers and to have an effect on their research (so the latter can use the findings in their own work), these refereed journal articles have to be accessible to their potential users. Hence, the idea that access to them should be toll-gated in any way makes as much sense as toll-gated access to commercial advertisements.

In other words, unlike the author royalty/fee-based literature, which constitutes the vast majority of the printed word, whether on-paper or on-line, this special, tiny, anomalous literature -- refereed journal articles -- is an author give-away: Its authors have never sought nor benefited in any way from the fact that access-tolls had to be paid to read their papers (in the form of individual and institutional subscriptions [S], and lately, for the on-line version, site-licenses [L] or pay-per-view [P]). On the contrary, those S/L/P access-barriers represent, and always did represent, impact-barriers for these authors, whose careers (promotion, tenure, funding, prizes) depend largely on the size of the research impact of their work.

How big are the impact barriers? There are currently at least 20K refereed journals across all fields of science and scholarship (this estimate could be as much as an order of magnitude too low), in which at least 2000K (probably many more) refereed articles appear annually. If we calculate the amount that all the institutions on the planet who can afford the S/L/P tolls (individual subscriptions are negligible in this reckoning) collectively pay for just one of those 2000K refereed papers, it averages about $2,000 per paper. That amount is what just those institutions who can and do pay for access pay. In exchange, that particular Give-Away paper is accessible to researchers at those, and only those, institutions. (Don't think of everyone else looking at it on the shelf of a public library: That does not scale up to the sort of free and full access that is possible in principle, especially online).

The research libraries of the world can be divided into the (minority) Harvards and the (majority) Have-Nots. It is obvious how the Have-Nots (and they prevail everywhere, not just in the Developing World) would benefit from free access to the entire refereed literature, for without it their meager S/L/P budgets can afford only a pitifully small portion of it. But not even Harvard can afford access to anywhere near all of it ( So the fact is that most of the annual 2000K+ refereed articles are currently inaccessible to most of the researchers on the planet. For the authors of those articles, this means that much of their potential impact (and actual access) is lost. And this curtailed research impact and access is what the $2000 per article currently being spent by the planet in S/L/P tolls is buying it.

This is the way things were, and the way they had to be, in the Gutenberg Galaxy, in which publishing as print-on-paper was the only way to make refereed research accessible to other researchers, and the sizeable costs of printing and distribution had to be recovered through access-tolls if the research was to have any impact at all. But we have to remind ourselves that we are no longer in the Gutenberg Galaxy. This new state of affairs has much less positive consequences for the majoritarian, Non-Give-Away literature (books, magazine articles), written for royalties and fees, and now at risk of digital piracy, Napster/Gnutella-style. But for the Give-Away literature, the PostGutenberg Galaxy is a godsend, for it at last makes it possible to eliminate all access/impact-barriers to refereed research.

It is not that all costs have vanished. Note that we are speaking about refereed research, not about some vast digital Vanity Press (for otherwise the analogy between researchers and advertisers would become a homology). Although the Gutenberg costs of printing and distribution (and even the costs of their on-line successors, such as journal publishers' PDF page-images) are no longer necessary ones, the cost of the quality-control and certification (QC/C) that differentiates the refereed literature from an unfiltered, anarchic, pot-luck Vanity Press still needs to be paid. Paper and PDF have become mere options, purchasable by those who want and can afford them; the refereeing, however, is and continues to be a medium-independent essential for scholarly and scientific research.

But what is refereeing, and what does it cost? Refereeing (also called "peer review") is the system of evaluation and feedback by which expert researchers assure the quality of one another's research findings. The referees' services, like the authors' research papers, are Give-Aways, so that is not where the remaining expense lies. It is the implementation of the refereeing procedures that necessarily entails some cost. But how much? How much does it cost for submitted manuscripts to be archived on a website, for an expert Editor to pick expert referees, email them the website of the submission, receive their emailed referee reports, email the author the reports plus an editorial disposition letter indicating what revision needs to be done to make the manuscript acceptable, and to repeat the process (if necessary) until the manuscript is accepted or rejected?

There is general agreement that the upper limit on the cost of implementing refereeing is not more than $500 per accepted article
( ) but even that figure almost certainly has needless Gutenberg costs wrapped into it (e.g., the creation of the publisher's PDF version). The true figure for peer-review implementation alone is probably much closer to $200 per article or even lower. In other words, QC/C costs account for only 10% of what the planet currently spends per article in those S/L/P tolls that restrict access to this Give-Away literature to only that minority of researchers who do not happen to be at a Have-Not institution for that particular article.

Is there any way to remedy this situation, in which this Give-Away PostGutenberg literature is being needlessly held hostage to obsolete Gutenberg costs and cost-recovery methods? First, let us note that it is not simply a matter of lowering the S/L/P access barriers. Even if the S/L/P tolls for all 20K refereed journals were slashed by 90%, that would still leave most researchers on the planet unable to access most of this author Give-Away research. No, there is only one solution, and it is an inevitable one: The refereed research literature must be freed, for everyone, everywhere, forever, online. And the irreducible 10% QC/C costs must no longer be paid for by the reader-institution, in the form of S/L/P tolls (reduced to 10%), with their attendant impact/access-barriers. Instead, they must be paid for as QC/C service costs by the author-institution, per paper published by its researchers, funded out of 10% of the institution's annual windfall S/L/P savings.

How to get there from here? Journal publishers will certainly not scale down to becoming only providers of the essential QC/C service (plus whatever add-on options there is still a market for) of their own accord: No one would. Nor can libraries, already weighed down by their escalating serials crisis, redirect any of their so far nonexistent windfall savings to any other purpose. Nor can authors be expected to sacrifice submitting their research to their established high-quality, high-impact journals, submitting it instead to new, alternative journals, with no track records, authorships, or niches, just because those journals happen to be prepared to provide QC/C alone right now. Journal niches are largely saturated already, and tenure/promotion/funding/prizes are far more important to researchers now then any potential longterm benefits (how soon?) from making risky sacrifices now.

There is a way, however, that researchers can have their cake and eat it too, right now. The entire refereed journal literature can be freed, virtually overnight, without authors having to give up their established refereed journals. The way has already been tried and proven to work by a portion of the Physics community. They have been publicly self-archiving their research papers online -- both before and after refereeing, i.e., both preprints and postprints -- since 1991. It is very important to note that this Physics "Eprint" Archive ( includes, and has always included, the refereed postprints too, for it has often been confusingly and incorrectly described as a "Preprint Archive" (with the implication that it is merely a Vanity Press for unrefereed papers). Preprint and postprint are merely successive embryological stages of a refereed, published journal article.

The Physics Eprint Archive (currently 150K papers in all) has been growing steadily. The annual number of new papers self-archived therein is now about 30K and increasing by about 3.5K per year. The archive, with its 14 mirror-sites world-wide, gets about 175K user "hits" per weekday at its US site alone. So there is no doubt that self-archiving can be done, and that when papers are thus made freely accessible online, they are indeed accessed, very heavily.

The problem is that although the Physicists have shown the way to free the refereed research literature, other disciplines have been slow to realize that it will work for them too. They have assumed that there must be something unique about Physics, and that the self-archiving strategy is pertinent only to Physics. This misapprehension has been encouraged by the (incorrect) impression already mentioned -- that it is only the unrefereed literature that the Physicists have freed online, and that doing so somehow puts at risk or compromises QC/C. Yet the fact is that absolutely nothing has changed with regard to peer review in Physics! The very same authors who self-archive continue to submit all their papers to their established refereed journals of choice, just as they always did, and virtually all the papers in the Archive appear in refereed journals about 12 months after journal submission. Nothing has changed -- except that a growing portion of the refereed literature in Physics is at last accessible free for all online (including earlier embryological stages that were not previously accessible at all).

The second problem, after the fact that the other disciplines have been so slow in following the lead of the physicists, is the fact that even in Physics, self-archiving is growing far too slowly: At the present linear growth rate, it will be another decade before the entire Physics literature is online and free. Something else is needed, both to accelerate the rate of self-archiving in Physics, and to extend the practise to all the other disciplines.

And that something else has now arrived. The reason the "subversive proposal" to free the refereed literature through author self-archiving fell largely on deaf ears in the early 90's (Harnad 1995) was that self-archiving in an anonymous FTP archive or a Web Home-Page would have freed the literature only in principle. In practise, all those scattered online papers, their locations, identities and formats varying arbitrarily, would be unsearchable, unnavigable, irretrievable, and hence unusable (unless one happened to know where a particular paper was in advance). Yet centralized archiving, even when made available to other disciplines (e.g. has not been catching on fast enough either (CogPrints has taken 3 years to reach 1K articles).

What was needed was something that would make the fruits of distributed, institution-based self-archiving equivalent to those of centralized self-archiving, and the key to that was to introduce and agree upon metadata-tagging standards that would make the contents of all the distributed archives interoperable, hence harvestable into one global "virtual" archive, all the papers searchable and retrievable by everyone for free, without having to know in advance where they happened to be individually archived, or in what form.

The Open Archives Initiative (OAI) ( has provided the meta-data tagging standards and a registry for all OAI-compliant Eprint Archives, and the Self-Archiving Initiative ( provided the (free) software for creating OAI-compliant Eprint Archives, interoperable with all other Open Archives, ready to be registered and for their contents to be harvested into searchable global archives (

Distributed Institution-based self-archiving is the natural way to generalize the practise of self-archiving across disciplines and institutions. It is not only the author who benefits from research impact. The reason promotion and tenure are contingent on research impact is that funding is contingent on it too. Hence institutional funding overheads and prestige are as much the beneficiaries of the freeing of their researchers' refereed research from any needless impact-barriers as individual researchers (and research itself) are. "Publish or perish" has always been an oversimplified slogan. What is meant by it is neither unrefereed (vanity-press) nor unread/uncited (impactless) publication. Written out in realistic longhand, the institutional slogan would be "Maximize your refereed research impact to maximize your (and our) rewards from it."

So researchers' institutions are not only natural allies in freeing their researchers' refereed research from any unnecessary impact-barriers, they are in a position to lead and speed the way (by providing and supporting the institutional archives and encouraging, indeed mandating their filling with their researchers' refereed papers). No such collective self-interest unifies or propels centralized, discipline-based self-archiving. Nor do the institutional benefits of distributed self-archiving stop with eliminating the impact-barriers to their own institutional researchers' research: Eliminating, for their own researchers, the access-barriers to the research of others, at other institutions, is another way of increasing their own research productivity and impact.

And that brings us to a third potential institution-level benefit weighing in for distributed institution-based self-archiving: the prospect of a solution to the spiraling serials budget crisis: The likelihood of eventually reducing the institutional library's annual serials expenditures to 10% (simply by eventually redirecting that proportion of the annual windfall 100% savings to covering the journal peer review implementation costs for their own researchers' refereed publications) is not only an added incentive for hastening the transition by facilitating institutional self-archiving. It also provides allies from the institutional library, who can (1) help researchers in the first-wave of self-archiving (self-archiving for them by proxy if need be), (2) maintain and preserve the institutional refereed Eprint Archives as an outgoing collection for external use, in place of the old incoming collection, acquired through S/L/P, for internal use. (3) Institutional library consortial power can also be used to provide leveraged support during the transition for journal publishers who commit themselves to a timetable of down-sizing to becoming pure QC/C service providers.

Here is a summary of this transitional scenario, explicitly separating the parts that are certain to produce the promised results from those that are in any way hypothetical, or conditional on what happens next:

(1) Once all the 2000K+ annual refereed journal articles are self-archived by their authors in their institution's registered, OAI-compliant Eprint Archives, this literature is de facto freed from all access- and impact-barriers online. The self-archiving could be done virtually overnight, and the day after, it would cease to be true that most of this give-away research is inaccessible online to most of the researchers on the planet. That is guaranteed.

(2) It is conceivable that that will be the end of it. The refereed literature will be free online for those who want it and cannot get it any other way, but those who can afford to get it the old way (via S/L/P) will continue to do so. In that case, the access/impact problem will be solved, but the serials crisis will not; it will simply become a much less urgent matter.

(3) But if, contrary to (2), when the refereed literature is accessible online for free, users prefer to use the free version (as so many physicists already do), then journal S/L/P revenues may shrink and institutional S/L/P savings may grow. Journals will then have to begin to scale down to providing only the essentials (the QC/C service), with the rest (on-paper version, on-line PDF version, other "added values") sold only as an option, as long as there is still a market for it. Those journals that agree to scale down to providing only the essentials (QC/C at ~$200 per accepted final draft) can make a leveraged transition agreement with a library consortium that will prop them up out of reader-institution-end S/L/P while they downsize, but according to a schedule with an agreed time by which they must have switched to author-institution-end QC/C cost-recovery (~$200) per accepted paper. The titles (editorial boards, refereeships, authorships) of journals whose publishers are not interested in continuing operations in such a downsized niche can migrate to new QC/C-only publishers who are, and who thereby instantly inherit an established journal.

Note that peer review itself is never compromised, sacrificed, or put at risk in this scenario; nor do authors have to give up, even temporarily, submitting to their established journal of choice. All they have to do is self-archive their preprints and postprints in their institutional Eprint Archives.

Are copyright restrictions any obstacle to self-archiving? Not at all. Pre-refereeing preprints can be self-archived without any restriction. For the post-refereeing, accepted final draft, the author can first try to modify the copyright agreement, transferring to the publisher all rights to give away, lease or sell the text, on-paper or on-line, in perpetuo, but retaining only the author's right to give it away online for free by self-archiving it. Publishers will not put that in their copyright agreements of their own accord in advance (although some already do, e.g., the American Physical Society:, but many will accept it if the author asks. For those papers, simply self-archive the refereed postprint alongside the pre-refereeing preprint(s). For those publishers who refuse to publish the paper unless all rights are transfered: Sign the restrictive agreement, and self-archive a linked "corrigenda" file listing for the user what changes have to be made in the preprint to make it equivalent to the postprint  (


