Open Citation (OPCIT) Project

INTEGRATING AND NAVIGATING EPRINT ARCHIVES THROUGH CITATION-LINKING: THE OPEN CITATION (OpCit) LINKING PROJECT

NSF / JISC - eLib Collaborative Project:
International Digital Libraries Research Programme

U.S. Partners: Paul Ginsparg ( Los Alamos National Laboratory) Joe Halpern ( Cornell) Carl Lagoze (Cornell)
U.K. Partners: Stevan Harnad (Southampton) Wendy Hall (Southampton) Les Carr (Southampton)
Associated Organizations: Association of Computer Machinery (ACM) British Computer Society (BCS)


 
PROJECT SUMMARY: The Los Alamos Eprint Archive (LANL) is a remarkable public repository for a substantial and growing proportion of the current research literature in Physics. It is rapidly becoming the primary way that the world physics community is accessing its literature. At this time, not only does there exist a very natural means of making this rich resource much more powerful and useful for its current physicist users (at least 35,000 worldwide daily), but its capabilities stand ready to be extended and universalized, so as to be able to render the same service for all the rest of the disciplines, whether within the LANL Archive itself, or in other archives designed along the same lines.
The key to this enhancement of LANL's present functionality and its extension to the rest of science and scholarship, is citation-linking. The World Wide Web is predicated on hypertext connections between documents, but for the scientific/scholarly world the scholarly link par excellence is formal citation of one paper by another. This is the way researchers have naturally been interconnecting their writings all along, but until know it has only been possible to follow those connections off-line, piece-wise, mediated by a great deal of real footwork in between. Now the entire corpus can be navigated via citations on-line.
Commercial journal publishers, along with secondary indexing/abstracting services, are exploring ways of interconnecting the on-line journal literature, but those initiatives are intrinsically and severely limited by the fact that that literature is criss-crossed with financial firewalls that prevent free navigation via full texts and their citations until and unless the access fees for each full text "hit" is first paid through subscription, site-license or pay-per-view. (To allow the full texts to be browsed for free would be equivalent to giving away the literature for free in the on-line medium.)
The Los Alamos Archive does not have this constraint; hence the citation linking can be done almost immediately, yielding seamless public access worldwide to the entire corpus. The OpCit project accordingly brings to bear the prior expertise and experience of the Open Journal and CogPrints team at Southampton UK, who have successfully developed (on a much smaller but interdisciplinary database) the citation linking tools that can now be applied and further developed to completely intralink LANL. To benefit from the citation linking, both the User and the Author interfaces to LANL have to be redesigned so as to adapt them to this advanced form of navigation and to universalize them for all disciplines. It is the Cornell team, with their track record of success in solving the associated interoperability and metadata problems with NCSTRL and CoRR who will be applying their expertise and experience here. And of course the unique success of the LANL team in having designed the Archive and its robust software, rendering it the indispensable resource it is, makes it the critical core partner in this collaboration. LANL is in many ways a microcosm for the future direction of the research literature on the Web as a whole. The OpCit project is also being undertaken in association with the Association of Computing Machinery in the US and the British Computer Society in the UK.
It is hoped that this project, if successful, will both focus and accelerate progress in a direction that will be beneficial to the world scholarly/scientific community.
 

1.0 The Target

It is easy to say what would be the ideal online resource for scholars and scientists: all papers* in all fields, systematically interconnected, effortlessly accessible and rationally navigable from any researcher's desk worldwide (Harnad 1995h, 1997a; Cambell 1997).

To implement this immediately, the entire preprint and reprint literature would need to be available online in a usable, unified form. It is not, yet. But a sufficiently large and representative subset of it is:  <http://xxx.lanl.gov/cgi-bin/show_monthly_submissions >.
So the work can start on that subset now, and successful results based on it will not only generalize to the rest of the literature, once it is all online, but it will serve to draw it online more quickly.
 

2.0 The LANL Archive

The subset in question is the NSF/DOE-supported Los Alamos National Laboratory (LANL) Eprint Archive  <http://xxx.lanl.gov >, which already contains over half the current physics journal literature and is growing at the rate of 25,000 papers annually, with over 35,000 users daily, and 15 mirror sites around the world. LANL also contains the Computing Research Repository (CoRR), which can be accessed directly through LANL or through the more generalized and integrated interface of the Networked Computer Science Technical Reference Library (NCSTRL) (Davis & Lagoze 1999). LANL (Paul Ginsparg) and CoRR/NCSTRL (Carl Lagoze, Joe Halpern) are partners in OpCit, in association with ACM (Association of Computer Machinery;  William Arms).

The LANL Archive represents a substantial body of literature in Physics, Mathematics and Computer Science, but the full texts are archived in a variety of forms, from HTML to TeX to PDF to PS, and the first problem that needs to be solved is designing a way to integrate and navigate them seamlessly.

One especially important feature of full texts -- their reference list -- is arguably the most natural and powerful way of interconnecting and navigating this literature. The "links" are already provided by the authors themselves, and users already have a long, skilled tradition of navigating with them "offline" (looking up the references in paper).

In the recently completed, JISC-funded Open Journal and CogPrints Projects, the UK partners (Wendy Hall, Stevan Harnad, Les Carr) have successfully used citation linking to interconnect a small but interdisciplinary "seed" database of full texts in the Cognitive Sciences with a much larger 10-year set of abstracts and their reference lists from a subset of the ISI (Institute for Scientific Information  http://www.isinet.com/prodserv/citation/citsci.html ) journal citation database in the Cognitive Sciences (Psychology, Neurobiology, Computer Science, Linguistics, Philosophy). This work has already gone some way toward solving the problem of automatically recognizing and linking (within and between texts) the finite but noisy set of existing citation formats (Hitchcock et al. 1997a-c, 1998a,b; Giles et al. 1998; Bolacker et al. 1998). The reaction of users was exhilaration with citation-based navigation, but frustration at accessing only abstracts. The obvious conclusion to be drawn was that the real power of citation linking can only be realized with full-text linking. That is what the LANL Archive makes possible.
 

3.0 Citation Analysis

Citation-linking offers further benefits over and above natural navigability for users and a natural unifying constraint on interoperability and metadata processing in interconnecting and integrating the literature: It also offers new ways of analyzing and understanding how the literature is used: Author-end citation analysis ( Garfield 1955) already reveals the offline lineages and dynamics in the growth of research knowledge (Chen & Carr 1999); reader-end citation analysis can now be used to analyse online useage patterns in a way that mere "hit" rate cannot be used. (The fact that you got from A to B by the series of citation links k, l, m... can be informative whether or not you actually stopped to read the full text of each link along the way.)

The LANL Archive has an additional interest from the standpoint of usage and citation analysis: It consists of both unpublished preprints and refereed reprints (the reprint often replacing the preprint as soon as it is available). Directly because of the impact of the LANL Archive in physics, a new form of citation has lately appeared (in both the paper and the online literature): citing the LANL preprint number ( Youngen 1998). This is almost certain to become a standard practise and must hence be covered as a special case of citation linking, so that the link can be dynamically updated and aliased as soon as the reprint takes the place of the preprint. But the emerging patterns of preprint vs. reprint citation and use are also a natural object of analysis by OpCit; it is a revealing microcosm of the overall transitional process that is taking place as this new medium evolves its niche in scholarly and scientific research practise (Harnad 1990).
 

4.0 Eight Components of Citation Linking Project

The citation linking project has 8 components which will be pursued partly in parallel, because some aspects of them are independent of one another, and partly serially, because some aspects of later components depend on the outputs from earlier ones. In addition, there is an open- ended 9th category of further enhancements:
 

4.1 (1) Redesigning and Universalizing the Author Deposit Procedure, Interface and Infrastructure.

The LANL Archive's present deposit system is a remarkably powerful, robust and successful one; it is what has made LANL the essential resource it has become. Authors in physics are comfortable with it, and it is clear that it is extremely effective. But now that the success of LANL's Physics sector has made it apparent that this unique resource should be generalized to other disciplines (Harnad 1992c), the more general needs and practises of those other disciplines need to be taken into account in designing a generalized deposit procedure, interface and infrastructure. In part, this has already been taking place at LANL anyway, with Paul Ginsparg and his collaborators always strengthening and extending the software infrastructure to keep pace both with new technical developments and with evolving practices among physicists.

But now the adaptation has to go beyond the physics community, which is already accustomed to the present LANL deposit interface and procedure, to other disciplines that are entirely unfamiliar with both LANL and Eprint archiving. The NCSTRL/CoRR/ACM experience and expertise in designing more generalized interfaces and in solving interoperability and metadata problems (Halpern & Lagoze 1999) will be a valuable partner in this deposit redesigning and universalizing component. In addition, the constraint of citation linking itself will provide a common skeletal structure for the rest of the redesigning and unifying efforts: The full range of variation in citation formats exists in all disciplines; hence this skeletal structure must be extractable from all texts, in a form that can then be used for hypertext linking. The adaptations -- in both the interface and the infrastructure for depositing texts -- that will be dictated by the need to extract and link citations in the texts, their reference lists, and the texts they cite, for all formats, will also constrain the texts as a whole, the formats in which authors are encouraged to submit them, and the way those formats are processed by the deposit software. In other words: whatever it takes to make all deposits interoperable specifically for citation extraction and linking will also help to make them interoperable in other respects, because citation linking is a representative microcosm of text linking in general.

To the extent that these citation-specific adaptations influence author practices, they should also help speed the standardization of formats and procedures that will eventually converge on the optimal universal resource for the learned research community.
 

4.2 (2) Redesigning the User Interface, its Capabilities and its Infrastructure.

The present LANL user interface is designed for retrieving papers as one would from a bibliographic database: top-down, using titles, author- names, or keywords (as well as less general classifiers such as year, subject-area, etc.). Once the paper is retrieved, the duties of the interface are done. Apart from any hyperlinks in the paper itself -- if it happens to be available in HTML (for the retrieved paper might instead be in a variety of other formats, including TeX or Postscript) -- the present interface and infrastructure offer no further navigational possibilities; the only way to retrieve another paper is by going back up for another top- down search.

The objective here is to redesign the interface, infrastructure and navigational software so that all papers are retrievable in a citation-linked format (currently HTML or PDF), with the result that once a user has retrieved an entry-level paper, navigation of the entire archive can continue via citation-links, with no need to launch another top-down search (although the top-down capabilities -- keyword, author, and even full-text search of the archive -- would continue to be available at the paper level). Algorithmic content classification, essentially by citation parsing and other methods, and navigation based on that, will also be incorporated. [See, for example http://www.columbia.edu/~fms5/astitl.html]
 

4.3 (3) Extracting Citation Data from all papers in the archive

4.4 (4) Generating Hypertext Links for all citations in the archive

4.5 (5) Automatic Addition of Hypertext Links for all papers in the archive

(3)-(5) constitute the core of the OpCit project. The author and user interfaces can be redesigned (1) and deposit formats and practices can be modified (2), but the lion's share of the work will be in ensuring that all papers can be converted into a citation-linkable form, and then designing the automatic tools that will actually be linking them. Many issues of document format conversion and bibliographic formatting need to be addressed in order to provide this automatic reference detection and linking capability. Some provisional solutions have already been devised as part of the Open Journals and CogPrints Projects, which worked on texts in PDF and HTML formats from various different journal publishers and from the Psycoloquy, BBS, and CogPrints Archives at Southampton (plus a Cognitive Science Subset of the ISI abstracts/citations database).

One of the many advantages of now extending this work to this uniquely comprehensive and heavily used database (LANL) is that partial results can be used to hasten progress toward fuller results: There will immediately be a subset of the Archive that can be fully citation-linked using our current tools: the subset that has correctly specified, well-formed bibliographic citations typeset by software which maintains the textual contents of the page. This subset can then be fully linked to all papers it cites that are in the Archive (even to those that are not yet themselves further linkable; their titles, author-names, abstracts and keywords will be enough to find them).

Users will accordingly have a chance to experience and compare functionality under two conditions: when they retrieve papers whose full texts are also linked, and when they retrieve papers that are dead-ends, like the abstracts that frustrated the users of the ISI/Open Journal Cognitive Science database. [Trying to retrieve a dead-end paper can be made to trigger an automatic email to the author of that paper indicating that a user has attempted to "cite-visit" it, but this was not possible because the author had not yet provided a version from which linkable citation data could be derived. This could be accompanied by clear instructions on how to provide such a version now; authors could indicate whether they wanted to see such access-failure reports for their deposited work instantly, weekly, monthly, semi-annually, or never.]

This double inducement -- (i) from experience as a user, able to fully cite- navigate some papers but unable to do so with others because they had not been archived in a form that could be citation linked, and (ii) from experience as an author, learning of unsuccessful attempts to cite-navigate through one's work via citation links -- should help to accelerate and focus changes in author practises that will increase the ratio of useable documents even as we are working directly on solving the problems of extracting still more from the present formats after the first partial quick- linking of the papers with the readily identifiable citation formats.
 

4.6 (6) Optimizing the Deposit Procedures and Formats of (1).

4.7 (7) Upgrading the Citation-Navigating Capabilities of (2).

The developments in (3),(4) and (5) will also feed back on (1), the author deposit procedure, which can be continuously upgraded and optimized in accordance with what proves to be best for the success of (3)-(5). Again, the extraordinarily high level of use of the LANL archive will help to accelerate convergence on practices and standards that are optimal for the user community. Similarly, developments in (5) will lead to upgrading of the user interface and its capabilities (2).
 

4.8 (8) Bibliometric analysis analysis of citation and usage.

The citation-linked database will make several kinds of analyses possible that will help us to better understand, predict and direct developments in this new medium:

Author-end citation patterns will be analyzed to determine the scope of the Archive: What proportion of citations point to current papers that are in the Archive? what proportion to current papers that are not in the archive? or to papers that pre-date the Archive? to books? to papers in their unrefereed preprint form? to papers in their final published form? how do these patterns change as the Archive's holdings grow, as its user-base grows, as its years of coverage grow?

Reader-end citation-based navigation patterns will be analyzed to determine how the Archive is used: This is entirely new bibliometric territory, because citation searching has only been possible offline until now, so there was no systematic way to study it. The data will be used both as feedback for optimizing the features of the user interface (2), and to evaluate the overall success of OpCit.
 

5.0 Some Metadata Considerations

Although the database for uOpCit is an Archive of unprecedented scope, covering a substantial specific technical literature (LANL), our primary objective is not to create an ultimate hypertext software solution, but rather to develop a family of generic tools based on current proposals in the metadata area (Lassila & Swick 1999) which can be applied to this remarkable archive right now (as well as to other WWW sites -- research, academic or commercial), so as to benefit from this immediate functionality. The effect should be (i) to enhance substantially the power, scope and utility of the LANL Archive itself, (ii) to help solve online archiving's prima facie problems of scale, compatibility and universality, (iii) to demonstrate, shape and focus user practices, and (iv) to hasten the growth and development of this unique and powerful new way of accessing and using the scholarly/scientific literature.

The general applicability of these techniques to interoperable digital library architectures (Lagoze & Payette 1998; Leiner 1998) will be pursued. Steps will be taken toward establishing a set of standards for the low-level interoperability i.e., a means of communicating meta-data and meta-information not only between the existing mirror servers within the current archive network, but also between the archive network and other resources.

In particular, the current situation in which citations are simply associated once and for all with destination URLs must be addressed. For practical flexibility, the recognition and analysis of citation information must be separate from document format convention or locus (e.g., in the LANL Archive, the primary/secondary publisher archive or the aggregation agent to which the citations may eventually be linked). The use of the emerging scholarly publishing standard, SLinkS (Hellman 1998), will also be investigated towards this end. It is also currently impossible to convert mathematical markup to html. MathML (a realization of XML) is on the horizon and promises to improve this situation.
 

6.0 Citation Tools Developed By Others

CiteSeer (Giles et al 1998) is a prototype of an "automatic citation indexing" system which is used to build a database of citations. Similar in function to the automatic citation linking of the Open Journals project, it has concentrated on linking in an unconstrained WWW environment, using a WWW crawler to gather PostScript (and latterly PDF) files from the public Web pages of research institutions. By contrast, the Open Journals' focus was on specific digital library resources, publishers' archives and existing citation databases which provide quasi-total coverage for chosen subject areas. It quite naturally generalizes to the LANL Archive. (See also Van de Sompel & Hochstenbach 1999 and http://ups.cs.odu.edu/).)

Some commercial publishers have now started to provide citation links from articles which they own into selected online bibliographic services. [It is noteworthy that it has been partners in the Open Journal project, namely, BioMedNet (Hitchcock et al 1998a) and ISI (Hitchcock et al 1998b), who have been among the first to do so, along with The Institute of Physics, which developed its HyperCite service (IOP Publishing 1996).] The LANL Archive, however, does not have any of the commercial firewall and barrier problems that arise between proprietary databases, and in physics it has incomparably greater self-contained coverage of the current literature.
 

7.0 Input From the Open Journal Project

The aim of the Open Journal (OJ) project was to interconnect and integrate a body of literature by automatically adding hypertext link overlays to a collection of existing documents served on the WWW Carr et al. 1996a; Carr & Hall 1998). This would allow navigating from paper to paper via citation links within or between archives, between documents with different text and reference styles and formats. The OJ project demonstrated this capability for archives consisting of both PDF and HTML documents, representing display-based and document-based file formats (Probets et al. 1998).

The Distributed Link Service (DLS), which applied these links, was a WWW implementation of the hypertext techniques that had previously been demonstrated in Southampton University's Microcosm research environment (Carr et al. 1995, 1996a, 1998a). It made use of a WWW proxy environment to add links to HTML or PDF documents while they were delivered from a digital library (in plain, unlinked form) to a user's browser (with links integrated into them). The DLS software used various modules, called "agents" (because they have an "expertise" at automatically recognising particular kinds of information in the document) (Carr et al. 1998b):

1. The keyword-agent is very simple and uses various databases of stand- alone links (which can be key words or other text strings) and attaches them to the papers whenever those strings appear.

2. The name-agent looks for different appearances of a name [e.g. "Leslie Carr", "Carr, L." or "Carr et al."], possibly in a specified context.

3. The citation-agent recognises occurrences of citations in academic papers in a large (and extendable) variety of formats and analyses their contents to determine author, year, publisher, page range and the like. It uses this information from each citation to perform a lookup in a bibliographic database and to add a link to either the online full text of the cited article (if the database shows that it is available) or to the bibliographic record consisting of abstract and citations (from the ISI database).

8.0 Application and Further Development of Open Journal Software

The OJ project yielded a set of generalizable tools that are immediately applicable to the LANL Preprint Archive. The following tools work on PDF or HTML documents:

Link Harvester: A stand-alone program that extracts pre-existing links from a document and adds them into a database. A new link-free version of the document is also generated.

Link Interpolater: A stand-alone program that inserts links from a database into a document. If different sets of databases are selected, the same document can be linked into different navigation strategies (e.g., citation, keyword, overlaid subject index).

Citation Harvester: A stand-alone program that extracts citations from a paper's reference lists for storage in a database.

Citation Interpolater: A stand-alone program that inserts links into a document based on the contents of a citation database. Links can be added to other documents in the same archive, to documents in other archives, or to generic bibliographic citation databases (such as ISI's Web of Science).

[To see a fragment of the first page of an ACM DL library article with both keyword and person links added wherever interesting people and systems are mentioned:  http://www.staff.ecs.soton.ac.uk/~lac/somewords/image4.gif To see another fragment from an ACM DL article that has been automatically populated with links to the ACM library from any citation of CACM or an ACM Hypertext Conference:  http://www.staff.ecs.soton.ac.uk/~lac/somewords/image5.gif ]
 

9.0 (9) Further Enhancements:

(1)-(8) are expected to occupy most of the three-year OpCit project, however there is an open-ended list of further enhancements that will be explored as time and opportunity allow:

9a. Other Kinds of Links: Papers can be automatically provided with other kinds of links using distributed link overlays as demonstrated in the Microcosm and DLS technologies (developed at Southampton by two of the partners, Wendy Hall and Les Carr) (Carr et al. 1996b; DeRoure et al. 1996).  http://www.mmrg.ecs.soton.ac.uk/publications.html These overlays can include links based on keywords, author-names (pointing to papers other than the explicitly cited ones), glossaries/indices, and even an inverted index for the corpus as a whole.

9b. Revision/Update Linking: There is no reason a research report should remain in a "frozen" state after it is published. The published version, suitably tagged, is a permanent formal milestone, especially for citation purposes, but an interlinked Archive also allows authors to deposit updated and revised drafts. The automatic linking system could be adapted to accommodate this, providing automatic forward and backward linking between versions.

9c. Commentary Links: For the same reason that links from unpublished preprints to refereed reprints to revised drafts of papers are of value, so are links to comments on papers, and authors' replies to comments. (One of the partners, Stevan Harnad, has special expertise in this area, being the founder and editor of two peer commentary journals, a paper journal of over 20 years standing (Behavioral and Brain Sciences (BBS)  http://www.princeton.edu/~harnad/bbs.html ) and an online journal of almost 10 years standing (Psycoloquy  http://www.princeton.edu/~harnad/psyc.html ) (Harnad 1984d, 1998a).

9d. Journal Links: There are several ways in which LANL can be useful to the journals in which its papers are published. LANL can provide links to the version of a paper in the journal's own official online archive. It can also provide links to cited papers in the journal's online archive that do not appear in LANL. Authors might wish to have arrangements for official links with the published version in order to provide an authenticated draft, or one in which the paper page images can be viewed or cited by page and line.

9e. Peer Review: Another useful service that LANL could provide to the journals in which its papers are published it is already beginning to provide: Authors submitting papers to the American Physical Society Journals (APS) can already do so by simply specifying the LANL version as their official submission. Referees can then be directed to that citation- enhanced draft in reviewing it. A password-controlled, non-public sector could also be created in LANL that would allow referee reports to be linked just as commentaries are in 9d above, but under the control of each journal. This would effectively be the implementation of online peer review for journals, and might be a model for the future relationship between refereed journals and public archives like LANL. Journals could also upload their final drafts to LANL for distribution in their own formats with journal-specific identifying graphics, etc. (The official journal version would then be part of the paper's overall revision "history," which could continue with comments, responses and updates; Harnad 1990d, 1992c.)

(A partner, Stevan Harnad, has extensive experience with peer review (Harnad 1982d, 1985f), including online peer review (Harnad 1996a, 1998a,h), likewise in association with a JISC-supported Project, the Eprint Archive in the Cognitive Sciences, CogPrints  http://cogprints.soton.ac.uk as well as http://opcit.eprints.org .

9f. Links to Proprietary Databases: Citation links leading out of the Archive could also go to proprietary databases that charge for their services. These could include journal archives, archives of scanned contents of journal back issues of journals, electronic books, and secondary publisher databases, such as INSPEC, MEDLINE or ISI. There are, however, strategic questions about whether LANL should implement links that entail charges to the user (Harnad 1998e).

9g. Links to Other Public Archives: Provisions could be made for citation links to papers in public archives other than LANL, but it may be more useful to merge other public archives with LANL (as they are not competing in any sense, and only stand to benefit from economies of scale and shared resources and development), perhaps through interfaces such as NCSTRL, into one seamless interconnected Archive; this too would provide constraints to help guide convergence into a unified global archive (Hitchcock et al. 1997c).

9h. Links to Authors' Home Server Archives: Apart from mirroring, one useful form of redundancy that LANL might encourage is that all its authors should also archive their papers on their home servers, to which the LANL version would also be linked. Links to the author's email address and URL should also be standard components of the LANL version (Harnad 1995h).
 

10.0 Overview

The OpCit project is a very timely and natural one. This is the moment to consolidate and universalize the unique LANL archive by utilizing its own unexploited richness to enhance both its useability and usefulness through citation linking and navigation and to extend its remarkable success, beginning with computer science, backed internationally by the ACM and BCS, through the multidisciplinary cognitive sciences, to the rest of the scientific and scholarly spectrum.

Both the resources and the expertise of the team assembled for OpCit are arguably the best in the world for this undertaking: The uniqueness of the LANL archive and the contributions of its creator, Paul Ginsparg, are beyond dispute. NCSTRL/CoRR and the Cornell group are the leaders in universalized gatewaying and interoperability of online resources and the Open-Journal/CogPrints team at Southampton are the leaders in journal and archive integration and linking. The huge, international usership of LANL, extended still further by the NCSTRL universal gateway, guarantees that the proposed enhancements will not only be widely tested, but that, if successful, they will strongly influence the evolution of online archiving of the learned literature as a whole. There is no question that radical changes in scholarly/scientific publishing and communication practices are poised to take place. OpCit will help to guide and hasten them in the right direction.

FOOTNOTE

*"Papers" may be too specific; we would want books connected too, and databases, audio, video, multimedia; but OpCit will only be concerned directly with papers (and their associated graphics), although the results should be readily generalizable to books too. "Papers" is also quite general, for it includes not only the refereed journal literature, but also the unpublished preprint literature, which blends into it continuously. Indeed, the preprint literature will be our starting point, as there are reasons to expect that it may grow to subsume the rest.
 
 

REFERENCES

Bachrach, S., Berry, S.R., Blume, M., von Foerster, T., Fowler, A., Ginsparg, P., Heller, S., Kestner, N., Odlyzko, A., Okerson, A., Wigington, R., & Moffat, A. (1998) Intellectual Property: Who Should Own Scientific Papers? Science 281 (5382): 1459-1460. September 4 1998.  http://www.sciencemag.org/cgi/content/full/281/5382/1459

Bollacker, K.D., Lawrence, S., Giles, C.L. (1998) CiteSeer: An Autonous Web Agent for Automatic Retrieval and Identification of Interesting Publications. Agents. 116-123  http://www.neci.nj.nec.com/homepages/lawrence/citeseer.html

Campbell, R.D. (1997) A Universal Citation Database as a Catalyst for Reform in Scholarly Communication . Firstmonday 2(4) April 27. http://firstmonday.org/issues/issue2_4/cameron/index.html

Carr, L., Davis H., Hall W., Hey J. (1996a) Turning the Web into a Library. In Proceedings of ELVIRA: The UK Digital Libraries Conference, De Montford University, UK.  http://www.mmrg.ecs.soton.ac.uk/publications/archive/carr1996c/

Carr, L. and Hall, W. (1998) Linking as Applied Coherence, Presentation at the First International Workshop on the Use of the WWW for the Public Understanding of Science, CERN November 1998. <>

Carr, L., De Roure, D., Hall, W., & Hill. G. (1998a) Implementing an Open Link Service for the World-Wide Web, WWW Journal, 1(2), Baltzer.  http://www.mmrg.ecs.soton.ac.uk/publications/archive/carr1998b/

Carr, L., De Roure, D., Hall, W., Hill, G., (1995) The Distributed Link Service: A Tool for Publishers, Authors and Readers, World Wide Web Journal 1(1), 647-656, O'Reilly & Associates.  http://www.mmrg.ecs.soton.ac.uk/publications/archive/carr1995/

Carr, L., H. Davis, D. De Roure, W. Hall G. Hill (1996b) Open Information Services, Computer Networks and ISDN Systems, 28 (7/11), 1027-1036, Elsevier.  http://www.mmrg.ecs.soton.ac.uk/publications/archive/carr1996b/

Carr, L., Hall, W., Hitchcock, S., (1998b) Link Services or Link Agents? In Proceedings of the Ninth ACM Conference on Hypertext, Pittsburgh. June 1998  http://www.mmrg.ecs.soton.ac.uk/publications/archive/carr1998a/

Carr, L., Hitchcock, S. (1995) The Open Journal Project  http://journals.ecs.soton.ac.uk>

Chen C. and Carr L. (1999) Trailblazing the literature of hypertext: author co-citation analysis (1989-1998). In Proceedings of the Tenth ACM Conference on Hypertext, Darmstadt. February 1999 <>

Davis, H., Hall, W., Heath, I., Hill, G. & Wilkins, R. (1992) "Towards an Integrated Information Environment with Open Hypermedia Systems" in the Proceedings of the ACM Conference on Hypertext (ECHT'92), Milan, November 1992, ACM Press, pp 181-190  http://www.mmrg.ecs.soton.ac.uk/publications/archive/davis1992/

Davis, J. R. and Lagoze, C. (1999) "NCSTRL: Design and Deployment of a Globally Distributed Digital Library," to appear in Journal of the American Society for Information Science (JASIS)  http://www2.cs.cornell.edu/lagoze/papers/NCSTRL-IEEE3.doc

De Roure, D.C., Carr, L.A., Hall, W. and Hill G.J. (1996) A Distributed Hypermedia Link Service. In Proceedings of the Third International Workshop on Services in Distributed and Networked Environments (SDNE96), Macau, June 3-4, 1996, IEEE Computer Society Press, pp156- 161.  http://www.mmrg.ecs.soton.ac.uk/publications/archive/deroure1996a/

Garfield, E., (1955) Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas. Science 122: 108-111 http://www.garfield.library.upenn.edu/papers/science_v122(3159)p108y1955.html

Giles, C.L., Bollacker, K. and Lawrence, S. (1998) CiteSeer: An Automatic Citation Indexing System, The Third ACM Conference on Digital Libraries, ACM Press, 89--98.  http://www.neci.nj.nec.com/homepages/lawrence/citeseer.html

Giles, C.L., Bollacker, K.D., Lawrence, S. (1998) CiteSeer: An Automatic Citation Indexing System. ACM DL. 89-98  http://www.neci.nj.nec.com/homepages/lawrence/papers/cs-aa98/

Ginsparg, P. (1994) First Steps Towards Electronic Research Communication. Computers in Physics. (August, American Institute of Physics). 8(4): 390-396.  http://xxx.lanl.gov/blurb/

Ginsparg, P. (1996) Winners and Losers in the Global research Village. Invited contribution, UNESCO Conference HQ, Paris, 19-23 Feb 1996  http://xxx.lanl.gov/blurb/pg96unesco.html

Hall, W., Davis, H.C. and Hutchings, G.A. (1996) Rethinking Hypermedia: the Microcosm Approach. Boston USA, Kluwer Academic Press 195pp.

Halpern, J. Y. and Lagoze, C. (1999) "The Computing Research Repository: Promoting the Rapid Dissemination and Archiving of Computer Science Research," (submitted to) Digital Libraries '99, The Fourth ACM Conference on Digital Libraries, Berkeley, CA.  http://www2.cs.cornell.edu/lagoze/papers/dl99.pdf

Harnad, S. & Hemus, M. (1997) All Or None: No Stable Hybrid or Half- Way Solutions for Launching the Learned Periodical Literature into the PostGutenberg Galaxy. In Butterworth, I. (Ed.) The Impact of Electronic Publishing on the Academic Community. London: Portland Press. Pp 18- 27.  http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad97.hybrid.pub.html

Harnad, S. (1979) Creative disagreement. The Sciences 19: 18 - 20.

Harnad, S. (ed.) (1982d) Peer commentary on peer review: A case study in scientific quality control. New York: Cambridge University Press.

Harnad, S. (1984d) Commentaries, opinions and the growth of scientific knowledge. American Psychologist 39: 1497 - 1498.

Harnad, S. (1985f) Rational disagreement in peer review. Science, Technology and Human Values 10: 55 - 62.

Harnad, S. (1986) Policing the Paper Chase. (Review of S. Lock, A difficult balance: Peer review in biomedical publication.) Nature 322: 24 - 5.

Harnad, S. (1990d) Scholarly Skywriting and the Prepublication Continuum of Scientific Inquiry. Psychological Science 1: 342 - 343 (reprinted in Current Contents 45: 9-13, November 11 1991).  http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad90.skywriting.html

Harnad, S. (1991b) Post-Gutenberg Galaxy: The Fourth Revolution in the Means of Production of Knowledge. Public-Access Computer Systems Review 2 (1): 39 - 53 (also reprinted in PACS Annual Review Volume 2 1992; and in R. D. Mason (ed.) Computer Conferencing: The Last Word. Beach Holme Publishers, 1992; and in: M. Strangelove & D. Kovacs: Directory of Electronic Journals, Newsletters, and Academic Discussion Lists (A. Okerson, ed), 2nd edition. Washington, DC, Association of Research Libraries, Office of Scientific & Academic Publishing, 1992); and in Hungarian translation in REPLIKA 1994; and in Japanese in Research and Development of Scholarly Information Dissemination Systems 1994-1995.  http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad91.postgutenberg.html

Harnad, S. (1992c) Interactive Publication: Extending the American Physical Society's Discipline-Specific Model for Electronic Publishing. Serials Review, Special Issue on Economics Models for Electronic Publishing, pp. 58 - 61.  http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad92.interactivpub.html

Harnad, S. (1995c) Electronic Scholarly Publication: Quo Vadis? Serials Review 21(1) 78-80 (Reprinted in Managing Information 2(3) 31-33 1995)  http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad95.quo.vadis.html

Harnad, S. (1995f) The PostGutenberg Galaxy: How To Get There From Here. Information Society 11(4) 285-292. Also appeared in: Times Higher Education Supplement. Multimedia. P. vi. May 12 1995.  http://www.cogsci.soton.ac.uk/~harnad/THES/thes.html

Harnad, S. (1995g) Sorting the Esoterica from the Exoterica: There's Plenty of Room in Cyberspace: Response to Fuller. Information Society 11(4) 305-324. Also appeared in: Times Higher Education Supplement. Multimedia. P. vi. June 9 1995.  http://www.cogsci.soton.ac.uk/~harnad/THES/harful1.html

Harnad, S. (1995h) Universal FTP Archives for Esoteric Science and Scholarship: A Subversive Proposal. In: Ann Okerson & James O'Donnell (Eds.) Scholarly Journals at the Crossroads; A Subversive Proposal for Electronic Publishing. Washington, DC., Association of Research Libraries, June 1995.
http://www.arl.org/scomm/subversive/toc.html

Harnad, S. (1996a) Implementing Peer Review on the Net: Scientific Quality Control in Scholarly Electronic Journals. In: Peek, R. & Newby, G. (Eds.) Scholarly Publishing: The Electronic Frontier. Cambridge MA: MIT Press. Pp. 103-118.  http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad96.peer.review.html

Harnad, S. (1997a) How to Fast-Forward Serials to the Inevitable and the Optimal for Scholars and Scientists. Serials Librarian 30: 73-81. [Reprinted in C. Christiansen & C. Leatham, Eds. Pioneering New Serials Frontiers: From Petroglyphs to CyberSerials. NY: Haworth Press, and in French as Comment Accelerer l'Ineluctable Evolution des Revues Erudites vers la Solution Optimale pour les Chercheurs et la Recherche.]  http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad97.learned.serials.html

Harnad, S. (1997c) The Paper House of Cards (And Why It Is Taking So Long To Collapse). Ariadne 8: 6-7.  http://www.ariadne.ac.uk/issue8/harnad/

Harnad, S. (1998) Learned Inquiry and the Net: The Role of Peer Review, Peer Commentary and Copyright. Learned Publishing 4(11): 283-292 Shorter version in 1997: Antiquity 71: 1042-1048 Excerpts also appeared in the University of Toronto Bulletin: 51(6) P. 12.  http://citd.scar.utoronto.ca/EPub/talks/Harnad_Snider.html

Harnad, S. (1998d) For Whom the Gate Tolls? Free the Online-Only Refereed Literature. American Scientist Forum. September 1998  http://www.cogsci.soton.ac.uk/~harnad/amlet.html

Harnad, S. (1998e) On-Line Journals and Financial Fire-Walls. Nature 395(6698): 127-128.  http://www.cogsci.soton.ac.uk/~harnad/nature.html

Harnad, S. (1998h) The invisible hand of peer review. Nature [online] (c. 5 Nov. 1998)  http://helix.nature.com/webmatters/invisible/invisible.html

Hellman, E. (1998) Scholarly Link Specification Framework  http://www.openly.com/SLinkS/

Hitchcock, S., L. Carr, W. Hall (1997a) Web journals publishing: a UK perspective, Serials, Vol.10, no.3, pp 285-299. (ISBN 0953-0460)  http://www.mmrg.ecs.soton.ac.uk/publications/archive/hitchcock1997/

Hitchcock, S., Carr, L., Harris, S., Hey, J. & Hall, W. (1997b) "Citation linking: improving access to online journals". Proceedings of Second ACM conference on Digital Libraries, Philadelphia, pp115-122.  http://www.mmrg.ecs.soton.ac.uk/publications/archive/hitchcock1997b/

Hitchcock, S., Quek, F., Carr, L., Hall, W., Witbrock, A., and Tarr, I. (1997c) Linking Everything to Everything: Journal Publishing Myth or Reality? ICCC/IFIP conference on Electronic Publishing 97: New Models and Opportunities, Canterbury,UK, April.  http://journals.ecs.soton.ac.uk/IFIP-ICCC97.html

Hitchcock, S., Carr, L., Harris, S., Probets, S., Evans, D., Hall, W. and Brailsford, D. (1998a) Linking electronic journals: lessons from the Open Journal project, DLib Magazine, Dec 1998 <>

Hitchcock, S., F. Quek, L. Carr, W.Hall, A. Witbrock and I. Tarr (1998b) Towards Universal Linking in Electronic Journals. Serials Review 24(1), 21-33. <>

Lagoze, C. and Payette, S. (1998) "An Infrastructure for Open Architecture Digital Libraries," Cornell University Computer Science, Technical Report TR98-1690, June 1998  http://ncstrl.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR98-1690

Lassila, O., and Swick, R. (eds) (1999) Resource Description Framework (RDF) Model and Syntax Specification. W3C Proposed Recommendation (January 1999).  http://www.w3.org/TR/PR-rdf-syntax/

Leiner, B.M. (1998) "The NCSTRL Approach to Open Architecture for the Confederated Digital Library," D-Lib Magazine, December 1998

Lock, Stephen (1985) A difficult balance : editorial peer review in medicine. London: Nuffield Provincial Hospitals Trust

Odlyzko, A.M. (1995) Tragic loss or good riddance? The impending demise of traditional scholarly journals, International Journal of Human- Computer Studies, 42 (1995), 71-122.  http://www.research.att.com/~amo/doc/tragic.loss.txt

Odlyzko, A.M. (1997) The slow evolution of electronic publishing. In Electronic Publishing - New Models and Opportunities, A. J. Meadows and F. Rowland, eds., ICCC Press, 1997.  http://www.research.att.com/~amo/doc/slow.evolution.txt

Odlyzko, A.M. (1998) The economics of electronic journals. In: Ekman R. and Quandt, R. (Eds) Technology and Scholarly Communication Univ. Calif. Press, 1998.  http://www.research.att.com/~amo/doc/economics.journals.txt

Okerson A. & O'Donnell, J. (Eds.) (1995) Scholarly Journals at the Crossroads; A Subversive Proposal for Electronic Publishing. Washington, DC., Association of Research Libraries, June 1995.  http://www.arl.org/scomm/subversive/toc.html

Pope, S. & Miller, L. (1998) Using the web for peer review and publication of scientific journals. Conservation ecology [online], (c. 5 Nov. 1998)  http://www.consecol.org/Journal/consortium.html

Probets, S., D. F. Brailsford, L. Carr and W. Hall (1998) Dynamic Link Inclusion in Online PDF Journals. In Proceedings of Seventh International Conference on Electronic Publishing, Document Manipulation and Typography. Springer-Verlag (Lecture Notes in Computer Science Series). April 1998.  http://www.mmrg.ecs.soton.ac.uk/publications/archive/probets1998/

Van de Sompel, H., & Hochstenbach, P. (1999) Reference Linking in a Hybrid Library Environment. D-Lib Magazine Volume 5 Issue 4 http://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt1.html

Walker, T.J. (1998) Free Internet Access to Traditional Journals. American Scientist 86(5)  http://www.amsci.org/amsci/articles/98articles/walker.html

Youngen, G. (1998) Citation patterns of the physics preprint literature with special emphasis on the preprints available electronically. UIUC Physics and Astronomy library [online] (c. 5 Nov. 1998)  http://www.physics.uiuc.edu/Physics/library/preprint.html