Enhance UK research impact and assessment 

by making the RAE webmetric

Stevan Harnad

When the proud mother of a brand-new PhD whose first paper has just been published asks: "So how much were you paid for it?" it takes quite a few words to explain to her that that is not exactly the way it works in scholarly and scientific research. Unlike journalists or book-authors, researchers do not receive royalties or fees for their writings. That's not why they write. They write for "research impact," which is the sum of all the effects of their own research on other research and researchers, and on the society that funds it all: How much is my research read, used, cited, and built-upon, in further research and applications? How much of a difference does it make that I did it at all?

It is for the sake of its impact that research is supported by tax-payers, for impact that researchers are salaried and promoted by their universities and funded to do their research, for impact that prizes and honours are awarded to scholars and scientists, for impact that their universities receive overhead funding and prestige. So how is research impact measured? One natural way resembles the way that Google, that remarkable search engine, measures the relevance of a website. Google counts links. When you search for something, it will rank-order your search results by how many other websites link to it. The assumption is that more important websites will have more websites linking to it. This works amazingly well, but it is far too crude for measuring research impact, which is not a matter of commercial success or popularity but of uptake. The question is: how much is this research paper being used by other researchers? But there is a cousin -- actually the mother -- of weblinks that researchers have been using for decades as a measure of impact, namely citations, that works much the same way.

It may occasionally happen that one paper cites another paper in order to criticize it or say it.s wrong. But the vast majority of citations are made in the process of making a positive contribution to knowledge, referencing the building blocks that it is using in doing so. So the more often a paper has been used as a building block, the higher its research impact. Citation counts are such powerful measures of impact that Eysenck and Smith have shown that in Psychology they predict the outcome of the UK Research Assessment Exercise (RAE) with an accuracy of over 80% (and Oppenheim has shown similar effects in other fields too). The RAE is unique to the UK and involves ranking all the departments in all the universities by their research impact (from 1, low, to 5 or 5* at the top) and then funding them according to those rankings. The idea is to both reward and encourage UK research impact in this way. Yet the RAE, in making its assessment and doing its rankings, does not actually use citation counts. It requires universities to spend vast amounts of time and energy -- time and energy that could have been used to do research! -- to compile, every four years, a massive paper dossier of all sorts of performance indicators. Then still more time and effort is expended by teams of assessors assessing all the dossiers and ranking them.

We now know that in many cases, citation counts alone would have saved at least 80% of all that time and effort. But the Google-like idea also suggests ways to do even better, enriching citation counts for papers and authors by another measure of impact: How often is the paper read? Brody and Carr have counted paper downloads and found that they they predict citations that will come later. To be used and cited, a paper first has to be accessed and read. Just as barometric pressure is a predictor of rain to come, downloads are a predictor of citations to come. And downloads are also usage (hence impact) measures in their own right. Other new measures of research impact also have their counterparts in Google: Google uses "hubs" and "authorities" to weight link-counts. Not all links are equal. It means more to be linked to by a high-link site than a low-link site. This is exactly equivalent to co-citation analysis, in which it matters more if you are cited by a Nobel laureate than by a fresh PhD. Besides co-citations, there is also the vast world of co-text analysis, in which combinations of certain key words or concepts may turn out to be be predictive of other measures of research impact, bringing together creative and influential combinations of ideas.

But what this rich new world of webmetric impact indicators requires in order to be mined and used to encourage and reward research, researchers and their institutions is not a 4-year exercise in paperwork like the present RAE. What is needed is that all university research output should be accessible and hence assessable online -- and not only the references cited but the full text. Then software agents (automatic computer programs) can be used to analyze it and derive a whole spectrum of impact indicators, adjustable to accommodate differences between disciplines. (Some, like physics, will have earlier forms of impact, signalled by downloads; others, like History, will have longer timelines before signs of impact are detectable. Some, like the sciences, will report most of their research in journal articles; others, like the humanities, will report it in books, of which the bibliography, if not the full text, can still be put online. Many disciplines will want to self-archive and credit the use of their accompanying data too.)

Nor is time-saving, efficiency, and the power and richness of these webmetric impact indicators their only benefit or even their principal one, for Steve Lawrence has shown in a 2001 paper in Nature that the citation counts of papers whose full texts are freely accessible on the web are over 300% higher than those of papers that are only accessible on paper, or on toll-access websites. So all of UK research stands to increase its impact dramatically by putting it all online. All that is needed is that every UK researcher have a standardised online CV, continuously updated with all the performance indicators the RAE wishes to count, with every journal paper listed in that CV linked to its full-text in that university's online "eprint" archive (front-matter and bibliography only for books). Webmetric assessment engines can do all the rest, harvesting and analyzing the data. At Southampton we have already designed (free) software for creating the RAE CVs and eprint archives, along with citebase, a webmetric engine that analyses citations and downloads. The only thing still needed is a UK university policy of self-archiving all research output to maximize its impact (we have a draft model for that too) encouraged by a UK national policy of self-archiving all research output to assess its impact.

We hope that these suggestions for enhancing both UK research impact and its assessment will have an impact on the plans to restructure the RAE: