Extending journal-based research impact assessment  to book-based disciplines

(Research Proposal)

L. Carr (ECS, Southampton)
S. Hitchcock (ECS, Southamtpon)
C. Oppenheim (Information Science, Loughborough),
J.W. McDonald (Social Statistics, Southampton),
T. Champion (Archeology, Southampton),
 S. Harnad (ECS, Southampton)

Summary: The ‘impact’ of academic research is typically measured by how much it is read, used and cited, and by how much new work it influences. Services that measure impact work well for journal-based disciplines. Book-based disciplines can now benefit from online tools and methods of impact analysis too. These analyses also predict fruitful directions for future research, and so can inform research assessment and funding. This project will extend tools for online bibliometric data collection of publications and their citations with the aim of testing and evaluating new Web metrics to assist research assessment in book-based disciplines.

A method has been proposed that could give online research assessment far richer, more sensitive and more predictive measures of research productivity and impact, for far less cost and effort (Harnad et al. 2003). This method, builds on and extends well-established, citation-based analysis of research impact applied typically to journal articles; we will now test whether this can be generalized to disciplines that depend on other types of publications such as books.
Many Arts and Humanities subjects are primarily book-based rather than article-based. The aim of this proposal is to:

·         engage book authors in these subject areas in self-archiving their book's metadata (author, title, date, publishers, keywords, abstract) plus its reference list (bibliography).
·         apply the scientometric resources that are currently available only for journal articles to book-based disciplines.
·         compute book-to-book citation-counts for books as well as for book-authors from the resulting database of self-archived book metadata+references.
All of this will help extend and integrate the scientometric analysis of citations and usage across disciplines, and between the domains of books and journals. It should improve predictive capability for assessment as well as extending our understanding of the embryology and evolutionary trajectory of knowledge.

The objectives of this work are to:

·         Produce a Web-based citation database with records for as many cited books across arts and humanities disciplines as can be gathered (using archaeology as the initial focus case)
·         Develop and test a series of Web metrics for measuring research impact based on these publications and their citations
·         Evaluate user acceptance of these metrics and offer selected metrics for ongoing use

Calls for researchers to self-archive their published research papers in openly accessible institutional archives, thereby increasing the visibility and impact of their work, would have little effect on arts and humanities subjects, it is widely claimed, because

(1) their research impact is book-based and not journal-based
(2) authors are more concerned about finding a prestigious publisher and obtaining a few good book reviews
(3) their journal-articles are considered “waystations” on the road to the book

If this is indeed an accurate description of the status quo in much of Arts and Humanities research insofar as practises and beliefs are concerned, it is anything but optimum. The online medium is poised to change things radically!

The project will gather bibliographic data from arts and humanities authors' books and articles, and will citation-link and citation-analyse them to create a book citation-impact factor. This has not been done systematically before, and it will not only be informative about parallels between the impact of books and the impact of articles, but it will have policy implications. The new measures should help do two things:

(1)     provide new ways of assessing the impact of book-based research
(2)     enhance impact by making the research more visible online.

It cannot do the latter for books on the scale that is possible for articles, because book-texts usually cannot be made openly accessible to all would-be users online, but it can still increase book visibility and might encourage some authors to make their books openly accessible too.

The following are examples of the kinds of comparisons that could help answer the research questions posed here:

(1) Compare the book-based impact magnitudes and rankings with their journal-based counterparts for every author where both measures are available.

(2) Compare the book-based impact magnitudes and rankings with the traditional criterial rankings (publisher prestige, number and favourability of review articles) and with rankings by peer or expert judgment.

(3) Compare with ISI's Web-of-Science's "gray" book impact, which it does not calculate or use, but could be calculated using the UK ISI database. (ISI has indicated already that it will approve our software agents as part of the UK national site license.)
Research context

Scientometric methods have long proved remarkably powerful for measuring impact of published works, and while research assessment appears to use impact data only indirectly, it is inevitable there will be some correlation. Recent studies have quantified the extent of the correlation, thus demonstrating the predictive power of scientometric methods  For example, statistical correlational analyses on the numerical outcome of the RAE using average citation frequencies have been shown to predict departmental outcome ratings remarkably closely. Smith and Eysenck (2002) found a correlation of as high as .91 in Psychology. Oppenheim (1998) and Holmes and Oppenheim (2001) found correlations of .80 and higher in other disciplines.

These scientometric methods have traditionally been applied only to select parts of the research literature, notably journal articles. Availability of online data makes it possible to enhance the quality and scope of scientometric tools, in particular to extend traditional citation-based methods to books, and supplement this with, for example, usage measures, to produce new metrics that can be used for continuous online assessment of research productivity and to assist the prediction of fruitful areas of new research for funding.

Being the only country with a national research assessment exercise, the UK is in a unique position to exploit these methods, which would increase the uptake and impact of UK research output, and set an example for the rest of the world that will almost certainly be emulated.


The project will be developed by a team that has built tools to manage online publication data for scientometric analysis, backed by researchers with experience and expertise in scientometric analysis and Webmetric techniques and leaders in some of the key fields to be analysed.

The primary method of the project will be to increase the number of analytic techniques we are already developing to measure the uptake and visibility of cited books. This will begin with collecting and supplementing book data using tools developed in the successful JISC-NSF funded Open Citation Project (opcit.eprints.org) for citation analysis for Open Archives (Hitchcock et al. 2002). The project proposed here would modify and extend these tools:

·         Eprints.org software (software.eprints.org), for managing institutional archives
·         Citebase (citebase.eprints.org), a discovery service with usage and citation-bases ranking
·         Paracite (paracite.eprints.org), an online reference finder

The user interface to Eprints software will be optimised for author input of metadata and reference data for arts and humanities books. Reference data will be automatically processed by Citebase and added to its growing citation database of books and papers. Online records of referenced items, where available, will be located by Paracite to enable reference linking. Based on the collected data, we will be able to build a number of analytic techniques to measure the uptake and visibility of cited works. The following are examples of the analyses that would become feasible as a result of this project:

·         Books and authors can be credited with their citation counts from both their book-to-book citations and their article-to-book citations (and added to the citation counts of article-to-article citations).
·         Early-stage predictors, such as usage-counts ("hits") for the self-archived book metadata (partly analogous to the article preprint hits that have proved predictive of later citation counts in the case of  articles).
·         The correlation between book citation-counts and the publisher imprimatur (the book publisher's "impact factor") will be computable.
·         The correlation between book citation-counts and (1) the number of book-reviews, (2) the impact factor of the book-review journal or magazine, and (3) the citation-count for the reviews -- all will be computable.
·         The time-course, predictivity, and pattern of inter-correlations will no doubt reveal a good deal more about the true impact of books, and it may even change publication and evaluation practises.
·         "Hubs and authorities" can be derived from this data, again for predictiveness and evaluation through recursive adjustment of citation weights.

Intermediate and final-stage evaluations will be performed on all user interfaces and computed outputs to monitor user reactions and ensure usability and fitness-for-purpose. Evaluations will involve observing small user groups, supplemented with Web forms-based feedback, as applied in the Open Citation Project (Hitchcock et al. 2003)

To direct and guide the project, monthly meetings (electronically moderated as needed) will be held with project co-applicants, and bi-annually with a project advisory board that is to include UK experts on bibliometrics and experts in the key subject areas to be covered.

The first requirement is to establish methods of data collection. A workable service that can continue beyond the project will need to cultivate authors to input data on their book publications, and this will be a primary aim of project advocacy to authors, to be directed through national and international research agencies. The initial aim is for the database to include records for at least as many books as submitted in the 2001 RAE. A critical mass of data will need to be built quickly in at least one focussed subject area to demonstrate the types of services that will encourage others. We would initially focus on archaeology, as outlined in the work plan below, where we will attempt to solicit bibliographies from the entire discipline, UK and international, supplemented by scanning in the bibliographies of books in the past 10 years. The rest of the sample will be based on solicitations in other disciplines. In the specific test case, the UK focus is for RAE implications; the Archaeology focus is for a complete test-case.

·         Findings on webmetric tools to analyse citation data will be disseminated through strategic presentations to universities and research-funders, and with Web demonstrators to support continuous use. These will also be linked to corresponding journal-based databases in the same subject matter.
·         A Web citation database for book-based publications consisting of their metadata and their cited references will be created, and will be accessible online to all (c.f. Citebase, citebase.eprints.org). The database will be used by both researchers and research assessors and funders to search, navigate and rank publications on the basis of existing and new measures of impact and usage.
·         The results will be publicised and promoted in journals such as Journal of Information Science, Journal of the American Society for Information Science and Technology, Scientometrics, Research Evaluation, Research Policy, D-Lib Magazine, First Monday, and at conferences and symposia, as well as through talks at UK universities and worldwide. A major conference in this area is the biannual International Conference on Scientometrics and Informetrics, due in 2005.
·         Presentations and reports for organisations involved in funding and research assessment decisions, including the Higher Education Funding Councils, funding agencies such as ESRC, EPSRC and AHRB, and major charities that fund research

·         Harnad, S. (2006) Future UK Research Assessment Exercise (RAE) to be Metrics-Based. http://openaccess.eprints.org/index.php?/archives/75-guid.html
·         Harnad, S. et al. (2003) “Mandated online RAE CVs linked to university eprint archives: Enhancing UK research impact and assessment”. Ariadne, issue 35, April 30 http://www.ariadne.ac.uk/issue35/harnad/
·         Hitchcock, S., et al. (2003) “Evaluating Citebase, an Open Access Web-based Citation-Ranked Search and Impact Discovery Service”.  http://opcit.eprints.org/evaluation/Citebase-evaluation/evaluation-report.html
·         Hitchcock, S., et al. (2002) “Open Citation Linking: the Way Forward”. D-Lib Magazine, Vol. 8, No. 10, October 2002 http://www.dlib.org/dlib/october02/hitchcock/10hitchcock.html
·         Holmes, Alison and Oppenheim, Charles (2001) “Use of citation analysis to predict the outcome of the 2001 Research Assessment Exercise for Unit of Assessment (UoA) 61: Library and Information Management”. Information Research, Vol. 6, No. 2, January http://www.shef.ac.uk/~is/publications/infres/paper103.html
·         Oppenheim, Charles (1998) “The correlation between citation counts and the 1992 research assessment exercise ratings for British research in genetics, anatomy and archaeology”. Journal of Documentation, 53:477-87 http://dois.mimas.ac.uk/DoIS/data/Articles/julkokltny:1998:v:54:i:5:p:477-487.html
·         Smith, Andrew and Eysenck, Michael (2002) "The correlation between RAE ratings and citation counts in psychology", June http://psyserver.pc.rhbnc.ac.uk/citations.pdf

Work plan

·         Technical requirements analysis; estimate extent of database
·         Design manual and automated data collection methods
·         Adapt and develop software for data collection, data finding, citation analysis
·         Data collection
o        Design Eprints interface for author deposit of book metadata
o        Focus on Archaeology to obtain a comprehensive test dataset
§         Approach authors and publishers
§         Scan and key-in data from library holdings, as necessary
o        Populate database to agreed targets and assure data quality
·         Data processing
o        Adapt software to reference styles to automatically process input data for citation database
o        Design user interface to citation database
·         Data finding
o        Tailor Paracite to seek online sources/records of referenced works
·         Monitor and adapt methods of data collection
·         Promote data deposit to all relevant book-based disciplines and extend data collection in critical areas
·         Investigate new Web metrics
·         Design initial metrics: demonstrate Web-based results
·         Develop tools to analyse and display Web metrics
·         Evaluate outputs of metrics
·         Optimise selected metrics for wider use
·         Report results

Principal RA (Dr Steve Hitchcock) responsibilities

·         Lead researcher
·         Project management
·         Liaise and work with technical RA
·         Advocacy for author deposit of books metadata throughout arts and humanities subjects, and uptake of Web metrics,
·         Maintain information for project team
·         Build and coordinate advisory board
·         Maintain project Web site
·         Manage experimental services produced by project
·         Dissemination: promotion, communication, writing papers and presentations
·         Evaluation and user testing
·         Writing project reports