OPSIS: OPen-Sourcing Institutional Self-archiving:

Previous Research and Track Record

The OPSIS project stems from research undertaken by the Intelligence, Agents, Multimedia Research Group (formerly the Multimedia Research Group) in the Department of Electronics and Computer Science at the University of Southampton. The group is a major research group of over 80 people with an international reputation in the area of open hypermedia and its application to distributed multimedia systems. Previous research has resulted in the open hypermedia system Microcosm [3, 4] and its successor, the Distributed Link Service (DLS) [5, 6]. Both systems have proven to be excellent vehicles for research and development in collaboration with industry. For example, the design of the DLS was supported by a ROPA award (GR/K36409), the outcomes of which fed through to the LinkME project (GR/M25919), the CoHSE (GR/M75426) DIM project and the European funded MEMOIR project (Esprit Project 22153) with Glaxo Wellcome and Unichema. The group was also successful in the recent round of EPSRC IRC funding. It is the lead site in the AKT IRC under the direction of Professor Nigel Shadbolt, and is a partner in the Equator IRC under the direction of Professor Wendy Hall. The work on knowledge modelling being undertaken in AKT is particularly relevant to the OPSIS Project. A strategic focus of the group is software agents for distributed information management (GR/K73060) and navigation of multimedia content (GR/603446). A strategic focus of the group is software agents for distributed information management (GR/K73060). The development of the Southampton Framework for Agent Research (SoFAR) has facilitated research in a number of areas including mobile agents (GR/N35816/01). Other projects relating directly to OPSIS include the Open Journals Framework (OJF) JISC project that finished in 1998, the ongoing Open Citation (OpCit) JISC/NSF Digital Libraries project (in conjunction with Cornell University and with support from the LANL archive) and the ongoing Conceptual Open Hypermedia Services Environment (CoHSE, GR/M75426). The group is active in the hypermedia, multimedia and Web research communities, is a member of W3C and in 1997 hosted the ACM Hypertext conference.

Stevan Harnad is Professor of Cognitive Science and member of the IAM Research Group in the Department of Electronics and Computer Science at the University of Southampton. His research is on the evolution of communication and language. He is the founder (1978) of the highly influential interdisciplinary journal of "Open Peer Commentary", Behavioral and Brain Sciences, as well as its online homologue (1989), Psycoloquy, and the online Open Archive (1998), CogPrints. Professor Harnad is a leader in the self-archiving initiative, working to make the worldís research literature free for all online, both before and after refereeing. The EPrints generic software for archive-creation, under increasing university adoption in all disciplines worldwide, was spearheaded by him, as was the citation-linking of the Physics arXiv, the development of powerful new online scientometric performance indicators, and the implementation of online-only processing for both paper and online journals, online peer review, online peer commentary, and many innovations in scholarly (and student) "Skywriting" on which he has published extensively and is widely cited.

Dr. Leslie Carr is a Lecturer in Computer Science at Southampton and is a member of the IAM Research Group in the Department of Electronics and Computer Science. His research interests include the development of multi-media information systems and their applications in education, industry and commerce, open hypermedia systems and link services and their deployment in digital libraries and the use of bibliographic analyses for navigating academic information. He has over fifty publications in the field and is the principal inventor of the Distributed Link Service, which has been commercially exploited through Active Navigation Ltd and which will be used in the deployment stages of the OPSIS project. He managed the eLib Open Journals Framework project, which focussed on using hypertext links to construct highly cross-referenced resources based on material from collections of journals. This work is now being extended in the JISC/NSF Digital Libraries OpCit project on which he is an investigator. He is on the program committee of the ACM Hypertext Conference 2001 and a Program Committee Vice Chair of the Eleventh International World Wide Web Conference. His is a member of the OAI Technical Committee.

Description of the Proposed Research and its Context

A. Background

The Need to Free the Refereed Literature

Unlike the authors of books and magazine articles, who write their texts for royalty or fee income, the authors of refereed journal articles write them only for "research impact", which means for their effects on research and researchers. In order to reach researchers and to have an effect on their research (so the latter can use the findings in their own work), these refereed journal articles have to be accessible to their potential users. Hence, the idea that access to them should be toll-gated in any way makes as much sense as toll-gated access to commercial advertisements. In other words, unlike the author royalty/fee-based literature, which constitutes the vast majority of the printed word, whether on-paper or on-line, this special, tiny, anomalous literature -- refereed journal articles -- is an author give-away: Its authors have never sought nor benefited in any way from the fact that access-tolls had to be paid to read their papers (in the form of individual and institutional subscriptions [S], and lately, for the on-line version, site-licenses [L] or pay-per-view [P]). On the contrary, those S/L/P access-barriers represent, and always did represent, impact-barriers for these authors, whose careers (promotion, tenure, funding, prizes) depend largely on the size of the research impact of their work.

The research libraries of the world can be divided into the (minority) Harvards and the (majority) Have-Nots. It is obvious how the Have-Nots (and they prevail everywhere, not just in the Developing World) would benefit from free access to the entire refereed literature, for without it their meager S/L/P budgets can afford only a pitifully small portion of it. But not even Harvard can afford access to anywhere near all of it (http://fisher.lib.virginia.edu/newarl/index.html). So the fact is that most of the annual 2000K+ refereed articles are currently inaccessible to most of the researchers on the planet. For the authors of those articles, this means that much of their potential impact (and actual access) is lost. And this curtailed research impact and access is what the $2000 per article currently being spent by the planet in S/L/P tolls is buying it.

Is there any way to remedy this situation, in which this Give-Away PostGutenberg literature is being needlessly held hostage to obsolete Gutenberg costs and cost-recovery methods? First, let us note that it is not simply a matter of lowering the S/L/P access barriers. Even if the S/L/P tolls for all 20K refereed journals were slashed by 90%, that would still leave most researchers on the planet unable to access most of this author Give-Away research. No, there is only one solution, and it is an inevitable one: The refereed research literature must be freed, for everyone, everywhere, forever, online. And the irreducible 10% QC/C costs must no longer be paid for by the reader-institution, in the form of S/L/P tolls (reduced to 10%), with their attendant impact/access-barriers. Instead, they must be paid for as QC/C service costs by the author-institution, per paper published by its researchers, funded out of 10% of the institution's annual windfall S/L/P savings.

How to get there from here? Journal publishers will certainly not scale down to becoming only providers of the essential QC/C service (plus whatever add-on options there is still a market for) of their own accord: No one would. Nor can libraries, already weighed down by their escalating serials crisis, redirect any of their so far nonexistent windfall savings to any other purpose. Nor can authors be expected to sacrifice submitting their research to their established high-quality, high-impact journals, submitting it instead to new, alternative journals, with no track records, authorships, or niches, just because those journals happen to be prepared to provide QC/C alone right now. Journal niches are largely saturated already, and tenure/promotion/funding/prizes are far more important to researchers now then any potential longterm benefits (how soon?) from making risky sacrifices now.

Self-Archiving

There is a way, however, that researchers can have their cake and eat it too, right now. The entire refereed journal literature can be freed, virtually overnight, without authors having to give up their established refereed journals. The way has already been tried and proven to work by a portion of the Physics community. They have been publicly self-archiving their research papers online -- both before and after refereeing, i.e., both preprints and postprints -- since 1991. It is very important to note that this Physics "Eprint" Archive (http://www.arxiv.org) includes, and has always included, the refereed postprints too, for it has often been confusingly and incorrectly described as a "Preprint Archive" (with the implication that it is merely a Vanity Press for unrefereed papers). Preprint and postprint are merely successive embryological stages of a refereed, published journal article.

The Physics Eprint Archive (currently 150K papers in all) has been growing steadily. The annual number of new papers self-archived therein is now about 30K and increasing by about 3.5K per year. The archive, with its 14 mirror-sites world-wide, gets about 175K user "hits" per weekday at its US site alone. So there is no doubt that self-archiving can be done, and that when papers are thus made freely accessible online, they are indeed accessed, very heavily.

The problem is that although the Physicists have shown the way to free the refereed research literature, other disciplines have been slow to realize that it will work for them too. They have assumed that there must be something unique about Physics, and that the self-archiving strategy is pertinent only to Physics. This misapprehension has been encouraged by the (incorrect) impression already mentioned -- that it is only the unrefereed literature that the Physicists have freed online, and that doing so somehow puts at risk or compromises QC/C. Yet the fact is that absolutely nothing has changed with regard to peer review in Physics! The very same authors who self-archive continue to submit all their papers to their established refereed journals of choice, just as they always did, and virtually all the papers in the Archive appear in refereed journals about 12 months after journal submission. Nothing has changed -- except that a growing portion of the refereed literature in Physics is at last accessible free for all online (including earlier embryological stages that were not previously accessible at all).

Co-operating Institutional Open Archives

The reason the "subversive proposal" to free the refereed literature through author self-archiving fell largely on deaf ears in the early 90's (Harnad 1995) was that self-archiving in an anonymous FTP archive or a Web Home-Page would have freed the literature only in principle. In practise, all those scattered online papers, their locations, identities and formats varying arbitrarily, would be unsearchable, unnavigable, irretrievable, and hence unusable (unless one happened to know where a particular paper was in advance). Yet centralized archiving, even when made available to other disciplines (e.g. http://cogprints.soton.ac.uk) has not been catching on fast enough either (CogPrints has taken 3 years to reach 1K articles).

What was needed was something that would make the fruits of distributed, institution-based self-archiving equivalent to those of centralized self-archiving, and the key to that was to introduce and agree upon metadata-tagging standards that would make the contents of all the distributed archives interoperable, hence harvestable into one global "virtual" archive, all the papers searchable and retrievable by everyone for free, without having to know in advance where they happened to be individually archived, or in what form.

The Open Archives Initiative (OAI) (http://www.openarchives.org) has provided the meta-data tagging standards and a registry for all OAI-compliant Eprint Archives, and the Self-Archiving Initiative (http://www.eprints.org)has provided the (free) software for creating OAI-compliant Eprint Archives, interoperable with all other Open Archives, ready to be registered and for their contents to be harvested into searchable global archives (http://cite-base.ecs.soton.ac.uk/cgi-bin/search/).

Distributed Institution-based self-archiving is the natural way to generalize the practise of self-archiving across disciplines and institutions. It is not only the author who benefits from research impact. The reason promotion and tenure are contingent on research impact is that funding is contingent on it too. Hence institutional funding overheads and prestige are as much the beneficiaries of the freeing of their researchers' refereed research from any needless impact-barriers as individual researchers (and research itself) are. "Publish or perish" has always been an oversimplified slogan. What is meant by it is neither unrefereed (vanity-press) nor unread/uncited (impactless) publication. Written out in realistic longhand, the institutional slogan would be "Maximize your refereed research impact to maximize your (and our) rewards from it."

So researchers' institutions are not only natural allies in freeing their researchers' refereed research from any unnecessary impact-barriers, they are in a position to lead and speed the way (by providing and supporting the institutional archives and encouraging, indeed mandating their filling with their researchers' refereed papers). No such collective self-interest unifies or propels centralized, discipline-based self-archiving. Nor do the institutional benefits of distributed self-archiving stop with eliminating the impact-barriers to their own institutional researchers' research: Eliminating, for their own researchers, the access-barriers to the research of others, at other institutions, is another way of increasing their own research productivity and impact.

And that brings us to a third potential institution-level benefit weighing in for distributed institution-based self-archiving: the prospect of a solution to the spiralling serials budget crisis: The likelihood of eventually reducing the institutional library's annual serials expenditures to 10% (simply by eventually redirecting that proportion of the annual windfall 100% savings to covering the journal peer review implementation costs for their own researchers' refereed publications) is not only an added incentive for hastening the transition by facilitating institutional self-archiving. It also provides allies from the institutional library, who can (1) help researchers in the first-wave of self-archiving (self-archiving for them by proxy if need be), (2) maintain and preserve the institutional refereed Eprint Archives as an outgoing collection for external use, in place of the old incoming collection, acquired through S/L/P, for internal use. (3) Institutional library consortial power can also be used to provide leveraged support during the transition for journal publishers who commit themselves to a timetable of down-sizing to becoming pure QC/C service providers.

EPrints Software

At the 2nd Open Archive Initiative (OAI) meeting in San Antonio in June 2000, a participant said:

"Open Archiving will not get off the ground until the day I can go to a website, download open-archiving software, then say make archive, and an interoperable, OAI-compliant archive is up and running, ready to be filled." <http://www.dlib.org/dlib/june00/06inbrief.html#FOX> In response to this request, echoed by many other potential users, the CogPrints project at Southampton University, applying our experience and expertise from the JISC/eLib-funded CogPrints archive, designed the generic eprints.org software that provides this functionality. A public version was released ilast year and has taken over operations at the CogPrints site <http://cogprints.soton.ac.uk/>. The operational release took place in January 2001 (and over 100 prospective users worldwide are already installing it, with the number of users increasing daily).

The eprints.org software is a feature-rich, easily installed, eprint archive system. It runs right "out of the box" with a comprehensive default setup that should serve most institutions and individuals' needs as it stands. It has also been designed to make it extensively and flexibly re-configurable for customised needs; almost any aspect of the archive's operation can be adapted to suit a particular requirement. The archive supports the OAI protocol, allowing it to interoperate with other open archives and open archive services, and to be readily upgraded to keep up with OAI revisions. This adaptability is achieved by using a modular design methodology. The system is divided into two main components: The core archive component, which provides the functionality required for all open archives, and the site-specific component, providing details about exactly what is stored in the archive, how it is presented and how it may be searched. The system is supplied with a richly featured site- specific component that requires minimal changing to set up a fully working, interoperable open archive. When updated revisions of the software become available, the core archive component can be upgraded, and the site retains its identity and data in the site-specific component. It is simple to add extra functionality to an archive in the site-specific component of the software. This means that the archive can be used by institutions, individuals, journals or any other organisation wishing to inter-operate with Open Archive services.

B. The Programme

Aims and Objectives

Current support for EPrints is provided by staff whose primary duties are for providing support and maintaining the infrastructure for 600 Computer Science undergraduates at the University of Southampton. The main aim of the OPSIS project is to provide a dedicated person who can focus on the task of supporting the EPrints software and users.

Having established EPrints as a shrink-wrapped package for creating institutional archives under previous funding initiatives, it is imperative to keep upgrading it in phase with OAI upgrades and in response to user feedback, especially as the need for institutional archives is becoming increasingly recognised in both national and international arenas. A mailing list was established to provide rudimentary support for installers of the EPrints software during the first months of uptake worldwide. This growing list, currently subscribed to by representatives of fifty universities around the world, needs to be handled by a dedicated support officer. Moreover, half of the installing institutions do not have English as their main language, hence internationalisation of the software is a major priority that needs to be quickly addressed. Beyond these specific activities, preparing the code for open-source status is a fundamental goal, so that the community can extend it and take over responsibility for maintenance in the same distributed spirit as the archives themselves.

Prompt response to email queries, providing remote assistance for installation and configuration of software, help in diagnosing problems with installed systems. Maintenance of EPrints distribution web site expanded to include FAQ list, scenario documents and user experiences containing best/worst practise recommendations. Tracking developments of OAI standards developments and feeding into new OAI work via OAI Technical Committee. The OAI need to experiment with proposed changes in order to make recommendations for forming the new standards. The standards also need reference implementations. As a significant part of the EPrints user community do not use English as their first language, it is imperative to provide an infrastructure for international and multilingual archives, and to co-ordinate the communityís translation efforts. Locale-specific labels and indexing methods must be supported within EPrints, and significant documentation should be accessible in various languages. As a first step, several users have co-operated to provide a translation of the download instructions into French. Each site has its own highly specific requirements for the EPrints installation; although we are attempting to provide for as many of these as possible, clearly all future requirements cannot be directly provided by the EPrints team. Consequently we aim to turn EPrints into an Open Source product, so that community development, maintenance and extension can be supported. Active dissemination activities (especially those to do with institutional politics or the theory of archiving rather than the logistics of running EPrints software) will be undertaken in conjunction with the OpCit project, especially via OAI meetings, relevant conferences, mailing lists and publications. Workplan Overview
 
Oct í01
Jan í02
Apr í02
Jul í02
Email and Phone Support        
Web Site and Documentation        
Internationalisation        
Open Sourcing        
OAI Participation        
Dissemination        

Resources Requested
 
£26,757
Grade 9 Research Assistant to perform the upgrading, support and open-sourcing
£1,200
developer's workstation PC
£1,543
development server PC with 100GB hard disk capacity
£500
JREI Digital Library server maintenance 
£30,000
Total

 
 
 

Staff Appointments

The project will employ one full-time research assistant with relevant experience from the JISC EliB Open Journals and MALIBU projects. Costings have been performed using the Research and Analogous Pay Scales, as at October 2000. Due to the high levels of remuneration for computing staff outside academia we request funding for 1 full-time RF at point 9 on the ACR2 scale. The RAs will make full use of the resources of the Intelligence, Agents, Multimedia Research Group at the University of Southampton.

Equipment

A workstation PC is requested for the RA, together with a high-capacity PC to act as a Linux-based development server for the archive. A contribution is also requested towards the maintenance of our main Solaris-based Digital Library servers, which are a legacy from our successful JREI bid.

Management

Supervision of the RA will take place in line with the IAM labís standard management practises. There will be quarterly management meetings that will liaise with the OpCit project management.

Bibliography

Harnad, S. (1990) Scholarly Skywriting and the Prepublication Continuum of Scientific Inquiry. Psychological Science 1: 342 - 343 (reprinted in Current Contents 45: 9-13, November 11 1991). http://cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad90.skywriting.html

Harnad, S. (1991) Post-Gutenberg Galaxy: The Fourth Revolution in the Means of Production of Knowledge. Public-Access Computer Systems Review 2 (1): 39 - 53. http://cogsci.soton.ac.uk/harnad/Papers/Harnad/harnad91.postgutenberg.html

Harnad, S. (1995) Universal FTP Archives for Esoteric Science and Scholarship: A Subversive Proposal. In: Ann Okerson & James O'Donnell (Eds.) Scholarly Journals at the Crossroads; A Subversive Proposal for Electronic Publishing. Washington, DC., Association of Research Libraries, June 1995. http://www.arl.org/scomm/subversive/toc.html

Harnad, S. (1998/2000) The invisible hand of peer review. Nature [online] (5 Nov. 1998) http://helix.nature.com/webmatters/invisible/invisible.html

Harnad, S. (2000) E-Knowledge: Freeing the Refereed Journal Corpus Online. Computer Law & Security Report 16(2) 78-87. http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad00.scinejm.htm

Harnad, S. (2000) Ingelfinger Over-Ruled: The Role of the Web in the Future of Refereed Medical Journal Publishing. The Lancet Perspectives 256 (December Supplement): s16. http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad00.lancet.htm

Harnad, S. (2001) For Whom the Gate Tolls? How and Why to Free the Refereed Research Literature Online Through Author/Institution Self-Archiving, Now. http://www.cogsci.soton.ac.uk/~harnad/Tp/resolution.htm

Harnad, S., Carr, L. & Brody, T. (2001) How and Why To Free All Refereed Research From Access- and Impact-Barriers Online, Now. http://www.cogsci.soton.ac.uk/~harnad/Tp/science.htm

Harnad, S., Varian, H. & Parks, R. (2000) Academic publishing in the online era: What Will Be For-Fee And What Will Be For-Free? Culture Machine 2 (Online Journal) http://www.cogsci.soton.ac.uk/~harnad/Temp/Varian/new1.htm http://culturemachine.tees.ac.uk/frm_f1.htm

Odlyzko, A.M. (1998) The economics of electronic journals. In: Ekman R. and Quandt, R. (Eds) Technology and Scholarly Communication. Univ. Calif. Press, 1998. http://www.research.att.com/~amo/doc/complete.html

Odlyzko, A.M. (1999a) Competition and cooperation: Libraries and publishers in the transition to electronic scholarly journals, A. M. Odlyzko. Journal of Electronic Publishing 4(4) (June 1999) and in J. Scholarly Publishing 30(4) (July 1999), pp. 163-185. The definitive version to appear in The Transition from Paper: A Vision of Scientific Communication in 2020, S. Berry and A. Moffat, eds., Springer, 2000. http://www.press.umich.edu/jep/04-04/odlyzko0404.html

Odlyzko, A.M. (1999b) The rapid evolution of scholarly communication," to appear in the proceedings of the 1999 PEAK conference. http://www.research.att.com/~amo/doc/rapid.evolution.pdf