From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>

Date: Tue, 19 Sep 2006 18:48:36 +0100

On Tue, 19 Sep 2006 l.hurtado_at_ED.AC.UK wrote:

*> --Charles Oppenheim insists that he's able to prove a significant
*

*> statistical correlation of some sort between RAE results in a variety
*

*> of fields, including Humanities subjects. It would be good to verify
*

*> this. I presume that online publication(s) are . . . available
*

*> somewhere or will be?
*

Several are (all *should* be! Charles?). Look in OAIster and Google Scholar.

*> --It will be interesting to see the specifics behind the claims. I'm
*

*> not clear what is meant by this "correlation". Does it mean that those
*

*> depts given a 5* also show up with . . . what, a much higher number of
*

*> their publications getting cited in the selected venues, or the
*

*> individuals in the dept being cited more frequently, or . . . whatever?
*

Correlation (positive) means high goes with high and low goes with low.

So the higher ranked departments have a higher total number of citations

of the papers published by the submitted researchers and the lower ranked

departments have a lower number.

In a word, the papers of higher RAE-ranked departments are cited more.

*> And what does "significant" correlation mean? (And please, I hope
*

*> that any publications don't give me that "regression to the mean"
*

*> technospeak, as I'm not a statistician, but I can follow logic. Give
*

*> me in plain English what is being counted and compared and how.)
*

The statistical significance of the correlation (which is *not* the same as its

size or importance) is the probability that the correlation just happened by

chance. If this probability is less the 0.001 or 0.001 it is fairly safe to assume

that it is not just a chance accident. And since the same correlation keeps being

found, across fields, the likelihood that these are all happening by accident is

negligible.

The size of the effect is another question: Correlation coefficients vary from -1

to +1. A correlation of 0 is no correlation at all. A simple way to think of the

size of the correlation is to square the correlation coefficient. That tells you

what percentage of the variation in the values of one variable is predictable from

the variation of the values in the other variable. For example, if height and

weight have a (statistically significant) correlation of 0.5 (and the average

height is 5 feet, standard deviation 6 inches, and the average weight is 150

pounds, standard deviation 50 pounds) then if you tell me that someone is 6 feet

tall, then I can predict that he weights 250 pounds, and I will be about 25% right

(because 0.5 squared is 0.25 or 25%). If the correlation was instead 0.9, then I

could make the same prediction and be 81% right.

Another example is the correlation between barometric pressure and rain. If the

correlation is 0.5, then tell me the barometric pressure and I can predict how much

it will rain, with 25% accuracy; if the correlation is 0.9, I can predict with 81%

accuracy. (Another way to express the accuracy is by putting a +/- take range

around the predicted value: that range is broader with lower correlations and

narrower with higher correlations.)

And that's how it is with citations as predictors of RAE ranks.

*> --I also note a somewhat different tone in Stevan's comments, which
*

*> seem to me to admit more forthrightly that "we ain't there yet" when it
*

*> comes to the wherewithall actually to conduct an across-the-board
*

*> analysis of the kind being mooted. Charles seems to suggest that he's
*

*> able to do this now. Or do I misundertand things?
*

I think Charles and I are in 100% agreement:

(1) All evidence so far is that RAE ranks can be predicted from citation

counts for all disciplines tested so far.

(2) There are still disciplines to be tested.

(3) Citation counts are not the only possible metrics.

(4) RAE panels should definitely be scrapped in favour of metrics in all

fields except those where no metric can be shown to correlate sufficiently

closely with the RAE rankings.

*> In any case, I hope that all parties understand the importance of
*

*> making sure that all of us affected by any metrics approach understand
*

*> it and can see its superiority and full feasibility. I certainly ain't
*

*> there yet on any of these matters, but let's see what rolls out. If
*

*> Stevan and Charles can put it together and show the rest of us how it
*

*> works well, I'll go for it. I just want to be shown (I originated from
*

*> Missouri).
*

Y'all hold onto yir hats; ya ain't seed nothin' yet!

Stevan Harnad

Correlation (positive) means high goes with high and low goes with low.

So the higher ranked departments have a higher total number of citations

of the papers published by the submitted researchers and the lower ranked

departments have a lower number.

In a word, the papers of higher RAE-ranked departments are cited more.

The statistical significance of the correlation (which is *not* the same as its

size or importance) is the probability that the correlation just happened by

chance. If this probability is less the 0.001 or 0.001 it is fairly safe to assume

that it is not just a chance accident. And since the same correlation keeps being

found, across fields, the likelihood that these are all happening by accident is

negligible.

The size of the effect is another question: Correlation coefficients vary from -1

to +1. A correlation of 0 is no correlation at all. A simple way to think of the

size of the correlation is to square the correlation coefficient. That tells you

what percentage of the variation in the values of one variable is predictable from

the variation of the values in the other variable. For example, if height and

weight have a (statistically significant) correlation of 0.5 (and the average

height is 5 feet, standard deviation 6 inches, and the average weight is 150

pounds, standard deviation 50 pounds) then if you tell me that someone is 6 feet

tall, then I can predict that he weights 250 pounds, and I will be about 25% right

(because 0.5 squared is 0.25 or 25%). If the correlation was instead 0.9, then I

could make the same prediction and be 81% right.

Another example is the correlation between barometric pressure and rain. If the

correlation is 0.5, then tell me the barometric pressure and I can predict how much

it will rain, with 25% accuracy; if the correlation is 0.9, I can predict with 81%

accuracy. (Another way to express the accuracy is by putting a +/- take range

around the predicted value: that range is broader with lower correlations and

narrower with higher correlations.)

And that's how it is with citations as predictors of RAE ranks.

I think Charles and I are in 100% agreement:

(1) All evidence so far is that RAE ranks can be predicted from citation

counts for all disciplines tested so far.

(2) There are still disciplines to be tested.

(3) Citation counts are not the only possible metrics.

(4) RAE panels should definitely be scrapped in favour of metrics in all

fields except those where no metric can be shown to correlate sufficiently

closely with the RAE rankings.

Y'all hold onto yir hats; ya ain't seed nothin' yet!

Stevan Harnad

