The Self-Archiving Initiative

Freeing the refereed research literature online

Stevan Harnad



Unlike the authors of books and magazine articles, who write for royalty or fees, the authors of refereed journal articles write only for 'research impact'. To be cited and built upon in the research of others, their findings have to be accessible to their potential users. Toll-gating the access to their findings was always as counterproductive as toll-gating access to commercial advertisements. Well in the online age it has at last become possible to free this anomalous literature from this unwelcome impediment: Its authors need only deposit their refereed articles in "Eprint Archives" at their own institutions; these "interoperable" archives can then all be "harvested" into a global "virtual archive," its full contents freely searchable and accessible online by everyone.

Unlike the royalty/fee-based literature, which constitutes the vast majority of the printed word, the special, tiny literature of refereed journal articles is, and always has been, an author give-away. Researchers never benefited from the fact that access-tolls had to be paid to read their papers (as subscriptions, and for the online version, site-licenses or pay-per-view). On the contrary, those access-barriers represent impact-barriers for researchers, whose careers and standing depend largely on the visibility and uptake of their research.

Impact barriers

There are currently at least 20,000 refereed journals across all fields of scholarship, publishing more than 2,000,000 refereed articles each year. The amount collectively paid by those of the world's institutions who can afford the tolls for just one of those refereed papers averages $2,000 per paper (1). In exchange for that fee, that particular paper is accessible to readers at those, and only those, paying institutions.

The research libraries of the world can be divided into the (minority) Harvards and the (majority) Have-nots -- the last by no means limited to the developing world. It is obvious how the Have-nots would benefit from free access to the entire refereed literature, for without it their meagre serials budgets can afford only a pitifully small portion of it. But not even Harvard can afford access to anywhere near all of it (see Hence most refereed articles are inaccessible to most researchers. For the authors, this means that much of their potential impact is lost. And it is solely this curtailed research impact and access that is being purchased by the collective $2,000 outlay per article mentioned above.

This is the way things had to be in the past, when publishing as print-on-paper was the only medium, and the sizeable costs of printing and distribution had to be recovered somehow. The new online era may be threatening the majority, royalty/fee-based literature (books, magazine articles) in the form of digital piracy; but for the 'give-away' research literature, it has at last made it possible to eliminate all those counterproductive access/impact-barriers.

Not all costs have vanished, of course. Although the costs of printing and distribution (and their on-line successors, such as publishers' PDF page-images) are no longer essential ones, the cost of the quality-control and certification that differentiates the refereed literature from an unfiltered, anarchic vanity-press still needs to be paid. Paper and PDF have become mere options, purchasable by those who want and can afford them; refereeing, however, is essential.

The essential costs of refereeing

Refereeing (peer review) is the system of evaluation and feedback by which expert researchers assure the quality of each others' findings. Referees' services are donated free to virtually all scientific journals, but there is a real cost to implementing the refereeing procedures, which include archiving submitted papers on a website; selecting appropriate referees; tracking submissions through rounds of review and author revision; making editorial judgments, and so on.

The minimum cost of implementing refereeing has been estimated as $500 per accepted article (see ), but even that figure almost certainly has inessential costs wrapped into it (for example, the creation of the publisher's PDF). I think that the true figure for peer-review implementation alone across all refereed journals probably averages much closer to $200 per article or even lower. Hence, quality-control costs account for only about 10% of the collective tolls actually being paid per article.

Can this situation, in which the authors' and referees' 'giveaways' are needlessly being held hostage to obsolete printing costs and cost-recovery methods, be remedied? Note that it is not simply a matter of lowering the financial access barriers: even if those were slashed by 90%, most researchers would still be unable to access most research papers. There is an optimal solution, and it is inevitable: the refereed research literature must be freed for everyone, everywhere, forever, online. The irreducible 10% (or so) quality-control cost need no longer be paid for by readers' institutions; it can be paid in the form of quality-control service costs, per paper published, by authors' institutions, out of 10% of their annual windfall savings on subscription costs.

Liberating the Give-Away Literature

Journal publishers certainly will not scale down to becoming only quality-control providers of their own accord. Nor can libraries effect such a transition on their own. And authors cannot and should not be expected to stop submitting their research to established high-quality, high-impact journals in preference for new, alternative journals, with no track records, authorships, or niches, just because those journals happen to be prepared to provide stand-alone quality-control right now ( Journal niches are largely saturated already, and immediate careers and standing are far more important to researchers than the potential long-term benefits of risky sacrifices (

But researchers can hasten the optimal and inevitable without any risk or sacrifice. The entire refereed journal literature can be freed, virtually overnight, without authors having to give up their established refereed journals, by a method already shown to work by a portion of the physics community. These physicists have since 1991 been publicly self-archiving their research papers online ­ both before and after refereeing (preprints and postprints) ­ in the physics 'eprint archive' at

That eprint archive currently holds 150,000 articles. The number of new articles being self-archived therein is currently about 30,000 annually, and increasing by about 3,500 papers each year. The archive, with its 14 mirror-sites world-wide, gets about 160,000 user 'hits' per weekday at its US site alone. So there is no doubt that self-archiving is feasible, and that when papers are thus made freely accessible online, they are heavily used.

But although these physicists have shown the way to free the refereed research literature, authors in other disciplines have been slow to realize that the system can work for them too. They have assumed that there must be something unique about physics that makes self-archiving work. This misapprehension has been encouraged by the incorrect impression that the physics archive contains only unrefereed preprints, and that self-archiving somehow compromises the quality control of journals. Yet absolutely nothing has changed in peer review in physics. The same authors who self-archive continue to submit all their papers to their journals of choice, just as they always did, and virtually all the papers in the archive appear in refereed journals about 12 months after journal submission. The only thing that has changed is that a growing portion of the refereed literature in physics is at last accessible, free for all, online. Yet even in physics, self-archiving is still growing far too slowly: at the present linear growth rate it will be another decade before the entire physics literature is online and free.

Distributed Institution-Based Self-Archiving

There is now a way both to accelerate the rate of self-archiving in physics and to extend the practice to the other disciplines. My original "subversive proposal" (2) to free the refereed literature through author self-archiving fell largely on deaf ears because self-archiving in an anonymous FTP archive or a Web home page would be unsearchable, unnavigable, irretrievable, and hence unusable. Nor has centralized archiving, even when made available to other disciplines, been catching on fast enough either (it has taken 3 years for the number of articles in to reach 1,000).

The new breakthrough is agreed metadata-tagging standards that make the contents of all the distributed archives "interoperable," hence harvestable into one global 'virtual' archive, all papers searchable and retrievable by everyone for free. The Open Archives Initiative (OAI) at has now provided the meta-data tagging standards and a registry for all OAI-compliant eprint archives; and the Self-Archiving Initiative at has provided free software for institutions to create OAI-compliant archives, interoperable with all other open archives, ready to be registered and for their contents to be harvested into searchable global archives, interlinked by citations to one another (see

Distributed, institution-based self-archiving benefits research institutions in three ways: (i) It maximizes the visibility and impact of their own refereed research output ( (ii) By symmetry, it maximizes their researchers' access to the full refereed research output of all other institutions. (iii) The third incentive for institutions themselves to hasten the transition to self-archiving is the likelihood of eventually reducing their library's annual serials expenditures budget to 10% (in the form of fees paid to journal publishers for the quality-control of their own research output instead of tolls for accessing other researchers' output). The institutional library can help researchers to do self-archiving and can maintain the institution's own refereed eprint archives as an outgoing collection for external use, in place of the old incoming collection via journal costs, for internal use. Institutional library consortial power can also be used to provide leveraged support for journal publishers who commit themselves to a timetable of downsizing to becoming pure quality-control service providers (

The transition scenario

I. As soon as all refereed journal articles are self-archived by their authors in their institution's eprint archive, the literature is freed from all access- and impact-barriers. Self-archiving could be done virtually overnight; the day after, all refereed research becomes freely accessible online to researchers the world over.

II. One possible outcome is that that will be the end of it. The refereed literature will be free online for those who want it and cannot get it any other way, but those who can afford to get it the old way via paying journals will continue to do so. In this event, the access/impact problem will be solved, but the library's budget crisis will not: it will simply become less urgent.

III. An alternative outcome is that when the refereed literature is accessible online for free, users will prefer the free version (as so many physicists already do). Journal revenues will then shrink and institutional savings grow, until journals eventually have to scale down to providing only the essentials (the peer-review service, paid for by the author's institution), with the rest (paper version, on-line PDF version, other 'added values') sold as options (to the reader's institution).

In none of these outcomes is peer review itself compromised, sacrificed, or put at risk; nor do authors have to give up, even temporarily, submitting to their established journals of choice. All they have to do is self-archive their preprints and postprints in their institutional eprint archives.

Nor are copyright restrictions an obstacle to self-archiving: Preprints can be self-archived without any restriction at the time the paper is submitted to a journal. When the final draft is accepted, authors can ask the journal to retain their right to give away that draft online by self-archiving it. In practice, many publishers will agree to this if the author asks, although most do not publicly state it as policy. For these papers, the author can self-archive the refereed postprint alongside the pre-refereeing preprint(s). For those publishers who refuse to publish the paper unless all rights are transferred, authors can sign the restrictive agreement and self-archive a linked "corrigenda" file, listing for the user what changes have to be made in the preprint to make it equivalent to the postprint (for details, see

Stevan Harnad is in the Intelligence/Agents/Multimedia Group, Department of Electronics and Computer Science, University of Southampton, Highfield, Southampton SO17 1BJ, UK

1. Odlyzko, A.M. (1998) The economics of electronic journals. In: Ekman R. and Quandt, R. (Eds) Technology and Scholarly Communication. Univ. Calif. Press, 1998.

2. Harnad, S. (1995) Universal FTP Archives for Esoteric Science and Scholarship: A Subversive Proposal. In: Ann Okerson & James O'Donnell (Eds.) Scholarly Journals at the Crossroads; A Subversive Proposal for Electronic Publishing. Washington, DC., Association of Research Libraries, June 1995.