Re: Future UK RAEs to be Metrics-Based

From: Stevan Harnad <>
Date: Wed, 29 Mar 2006 12:43:25 +0100

Both of Yorick's witty tongue-in-cheek suggestions are more apt
than you might think (though I am sure Yorick is quite aware of it):

(1) There's no need to keep the metric criteria secret, because
deliberate abusers of them (spurious, endogamous citation patterns,
self-inflated downloads) are all detectable in an open-access corpus,
and can be named-and-shamed -- indeed, the risk of name-shame is likely
to offset the temptation to abuse in most cases pre-emptively.

(2) Co-text analysis (including Tom Landauer's latent sematic analysis)
is among the very natural first choices to try out in devising new
metrics. And, as usual, the metric equation must have multiple components
precisely so that it can calibrate, weight other components, to control
for and offset nonse effects (like random buzz-word salad) as well as
deliberate abuse and deviations.

The Open Access Digital Database is optimal and inevitable for research;
we can delay it, we can mock it, we can fear it, but sooner or later we
will face it, and use it, and work out ways to control and correct its
imperfections, just as we will do with viruses and spam. (I can already
hear the ironic rebuttals this will now elicit! -- but it's true, it's
likely to be a constant cycle of protective upgrades to counter upgrades
in abuse; there will be invasions that are temporarily successful,
but then driven out by a combination of collective ingenuity and the
all-important openness of the whole medium. At least that is the way
we have to play it, if we are not to be Luddites renouncing the obvious
good because of the equally obvious potential for bad.)


On Wed, 29 Mar 2006, Yorick Wilks wrote:

> It's remarkable how exiguous these so-called metrics are, and that
> the first ones thought of now turn out to be questionable for a range
> of reasons: the fallibility of regressions (Anscombe), and the
> possible undesirable consequences if the publication of the measure
> affects behaviour on a large scale, such as swamping the Councils
> with poor research proposals (Bundy).
> Just for the sake of variety I have two not-wholly-serious suggestions:
> 1) the RAE subpanel, or its replacement, determine a set of features
> that can be measured automatically, but do not declare them until
> after the census date. They are of course bound by an oath of state
> secrecy.
> 2) the issue has been tackled in the UK AI literature. In the AISB
> Quarterly around 1999 Father Hacker, a semi-resident correspondent,
> suggested that some current language processing technologies could do
> the whole RAE for about 250, a considerable saving. I cannot find
> the file now, but the core of the suggestion was to adapt the well-
> known scoring algorithms based on text similarity to training sets of
> different qualities; in this case sets of 5*,5, 4 etc. journal papers
> of sufficient size from previous RAE's. All output submissions would
> then be automatically categorised against these sets. The best known
> such method (patented and sold in the USA) is by Landauer (Latent
> Semantic Analysis) and is widely used to classify student essays
> without reading them-----the correlation with teachers grades is
> remarkably close. The only odd feature of this algorithm (there are
> many others) is that it takes the text words without order which, if
> it were successful when applied to research papers, would be a bit of
> a blow to the authors' ego.
> I do not believe HEFCE contacted Father Hacker after publication, but
> times have now changed,
> and AISBQ is not a high impact-factor publication.
> Yorick Wilks
Received on Sat Apr 08 2006 - 17:57:40 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:18 GMT