- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>

Date: Tue, 28 Mar 2006 12:19:27 +0100

Date: Tue, 28 Mar 2006 08:13:32 -0500

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>

To: ASIS&T Special Interest Group on Metrics <SIGMETRICS_at_LISTSERV.UTK.EDU>

Subject: Re: Future UK RAEs to be Metrics-Based

The UK has a "dual" funding system: (1) conventional direct research

grant applications, with peer review of competitive proposals (RCUK) and

(2) top-sliced funding accorded to departments (not individuals) based on

past departmental research performance (RAE). The RAE was a monstrously

expensive and time-consuming exercise, with paper collection and

submission of all kinds of performance markers, including 4 full-text

papers, for peer-re-review by RAE panels. It turned out that the RAE's

outcome -- each departmental RAE "rank" from 1 to 5*, with top-sliced

funding given according to the rank and number of researchers submitted

-- was highly correlated with total citation counts for the department's

submitted researchers (r = .7 to .9+) and even more highly correlated

with prior RCUK funding (.98).

So RAE rank correlates highly with prior RCUK (and European) funding and

almost as highly with citations (and with other metrics, such as number

of doctorates accorded, etc.). The RAE rank is based on the data received

and evaluated by the panel -- not through multiple regression, but

through some sort of subjective weighting, including a "peer-re-review"

of already published, already peer-reviewed articles (although I very

much doubt many of them are actually read, the panels not being specific

experts in their subject matter as the original journal peer-reviewers

were meant to be -- it is far more likely that their ranking of the

articles is based on the reputation of the journal in which they were

published, and there is definitely pressure in the departments to

preferentially submit articles that have been published in high-quality,

high-impact journals).

So what is counted explicitly is prior funding, doctorates, and a few

other explicit measures; in addition, there is the "peer-re-review" --

whatever that amounts to -- which is no doubt *implicitly* influenced by

journal reputations and impact factors. However, neither journal impact

factors nor article/author citations are actually counted *explicitly* --

indeed it is explicitly forbidden to count citations for the RAE. That

makes the high correlation of the RAE outcome with citation counts all

the more remarkable -- and less remarkable than the even higher

correlation with prior funding, which *is* counted explicitly.

The multiple regression ("metric") method is not yet in use at all. It

will now be tried out, in parallel with the next RAE (2008), which will

be conducted in the usual way, but doing the metrics alongside.

Prior funding counts are no doubt causal in the present RAE outcome

(since they are explicitly counted), but that is not the same as saying

that research funding is causal in generating research performance

quality! (Funding is no doubt causal in being a necessary precondition

for research quality, because without funding one cannot do research, but

to what extent prior funding levels in and of themselves are causes of

research quality variance over and above being a Matthew Effect or

self-fulfilling prophecy is an empirical question about how good a

predictor individual research-proposal peer-review is for allotting

departmental top-sliced finding to reward and foster research

performance.

Hence the causality question is in a sense a question about the causal

efficacy of UK's dual funding system itself, and the relative

independence of its two components. For if they are indeed measuring and

rewarding the very same thing, then RAE and the dual system may as well

be scrapped, and the individual RCUK proposal funding with the redirected

funds simply scaled up proportionately .

I am not at all convinced that the dual system itself should be scrapped,

however; just that the present costly and wasteful implementation of the

RAE component should be replaced by metrics. And those metrics should

certainly not be restricted to prior funding, even though it was so

highly correlated with RAE ranking. It should be enriched by many other

metric variables in a regression equation, composed and calibrated

according to each discipline's peculiar profile as well as its internal

and external validation results. And let us supplement conservative

metrics with the many richer and more diverse ones that will be afforded

by an online, open-access full-text corpus, citation-interlinked, tagged,

and usage-monitored.

Stevan Harnad

On 28-Mar-06, at 6:39 AM, Loet Leydesdorff wrote

*> > SH: To repeat: The RAE itself is a predictor, in want of
*

*> > validation. Prior funding correlates 0.98 with this predictor
*

*> > (in some fields, and is hence virtually identical with it),
*

*> > but is itself in want of validation.
*

*> Do you wish to say that both the RAE and the multivariate regression
*

*> method correlate highly with prior funding. Is the latter perhaps causal
*

*> for research quality, in your opinion?
*

*>
*

*> The policy conclusion would then be that both indicators are very
*

*> conservative. Perhaps, that is not a bad thing, but one may wish to
*

*> state it straightforwardly.
*

Date: Tue, 28 Mar 2006 12:19:27 +0100 (BST)

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>

To: SIGMETRICS_at_LISTSERV.UTK.EDU, oaci-working-group_at_mailhost.soros.org

Subject: Re: Future UK RAEs to be Metrics-Based

This an anonymised exchange from a non-public list concerning

scientometrics and the future of the UK Research Assessment Exercise.

I think it has important general scientometric implications.

By way of context: The RAE was an expensive, time-consuming

submission/peer-re-evaluation exercise, performed every 4 years. It turned

out a few simple metrics were highly correlated with its outcome. So it

was proposed to scrap the expensive method in favour of just using

the metrics. -- SH

---------- Forwarded message ----------

On Tue, 28 Mar 2006, [identity deleted] wrote:

*> At 8:34 am -0500 27/3/06, Stevan Harnad wrote:
*

*> >SH: Scrap the RAE make-work, by all means, but don't just rely on one
*

*> >metric! The whole point of metrics is to have many independent
*

*> >predictors, so as to account for as much as possible of the
*

*> >criterion variance:
*

*>
*

*> This seems extremely naive to me. All the proposed metrics I have
*

*> seen are *far* from independent - indeed they seem likely to be
*

*> strongly positively associated.
*

that's fine. In multiple regression it is not necessary that each

predictor variable be orthogonal; they need only predict a significant

portion of the residual variance in the target (or "criterion") after

the correlated portion has been partialled out. If you are trying to

predict university performance and you have maths marks, english marks

and letters of recommendation (quantified), it is not necessary, indeed

not even desirable, that the correlation among the three predictors

should be zero. That they are correlated shows that they are partially

measuring the same thing. What is needed is that the three jointly, in a

multilinear equation, should predict university performance better than

any one of them alone. Their respective contributions to the variance

can then be given a weight.

The analogy is vectors, a linear combination of several of which may

yield another, target vector. It need not be a linear combination of

orthogonal vectors, just linearly independent ones.

Three other points:

(1) RCUK ranking itself is just a predictor, not the criterion that is

being predicted and against which the predictor(s) need to be validated.

The criterion is research performance/quality. Only metrics with face

validity can be taken to be identical with the criterion, as opposed to

mere predictors of it, and the RAE outcome is certainly not face-valid.

(2) Given (1), it follows that the *extremely* high correlation between

prior funding and RAE rank (0.98 was mentioned) is *not* a desirable

thing. The predictive power of the RAE ranking needs to be increased, by

adding more (semi-independent but not necessarily orthogonal) predictor

metrics to a regression equation (such as funding, citations, downloads,

co-citations, completed PhDs, and many other potential metrics that will

emerge from an Open Access database and digital performance record-keeping

CVs, customised for each discipline) rather than being replaced by a

single one-dimensional predictor metric (prior funding) that happens to

co-vary almost identically with the prior RAE outcome in many disciplines.

(3) Validating predictor metrics against the target criterion is

notoriously difficult when the criterion itself has no direct

face-valid measure. (An example is the problem of validating IQ tests.)

The solution is partly internal validation (validating multiple

predictor metrics against one another) and partly calibration, which

is the adjustment of the weight and number of the predictor metrics

according to corrective feedback from their outcome: In the case of the

RAE multiple regression equation, this could be done partly on the basis

of the 4-year predictive power of metrics against their own later values,

and partly against subjective peer rankings of departmental performance

and quality as well as peer satisfaction ratings for the RAE outcomes

themselves. (There may well be other validating methods.)

*> This sounds perilously close to what I used to read in the software
*

*> metrics literature, where attempts were made to capture 'complexity'
*

*> in order to predict the success or failure of software projects.
*

*> People there adopted a
*

*> measure-everything-you-can-think-of-and-hope-something-useful-pops-up
*

*> approach. The problem was that all the different metrics turned out
*

*> to be variants of 'size', and even together they did not enable good
*

*> prediction.
*

It is conceivable but unlikely that all research performance predictor

metrics turn out to be measuring the same thing, and that none of them

contributes a separate independent component to the variance of the

outcome; but I rather doubt it. At the risk of arousing other prejudices,

I would make an analogy with psychometrics: Test of cognitive

performance capacity (formerly called "IQ" tests) (maths, spatial,

verbal, motor, musical, reasoning, etc.) are constructed and validated

by devising test items and testing them first for reliability (i.e.,

how well they correlate with themselves on repeated administration)

and then cross-correlation and external validation. The (empirical)

result has been the emergence of one general or "G" factor for which

the weight or "load" of some tests is greater than others, so that no

single test measures it exactly, and hence a multiple regression battery,

with each test weighted according to the amount of variance it accounts

for, is preferable to relying on just a single test. And the outcome

is that there turns out to be the one large underlying G factor, with

a component in every one of the tests, plus a constellation of special

factors, associated with special abilities supplementing the G factor,

each adding a smaller but significant component to the variance too,

but varying by individual and field in their predictive power.

The controversy has been about whether the fact that the tests are

validated on the basis of positive correlations among the items is the

artifactual source of the positive manifold underlying G. I am not

a statistician or a psychometrician, but I think the more competent,

objective verdict (the one not driven by a-priori ideological views)

has been that G is *not* an artifact of the selection for positive

correlations, but a genuine empirical finding about a single general

(indeed biological) factor underlying intelligence.

I am not saying there will be a "G" underlying research performance!

Just that the multilinear (and indeed nonlinear) regression method can

be used to tease out the variance and the predictivity from a rich and

diverse set of intercorrelated predictor metrics. (It can also sort

out the duds, that are either redundant or predict nothing of interest

at all.)

*> > SH: Metrics are trying to measure and evaluate research performance,
*

*>
*

*> I think you mean 'predict' - not the same thing at all
*

They measure the predictor variable and try to predict the criterion

variable. As such, they are meant to provide an objective (but

validated) basis for evaluation.

*> >SH: not just to 2nd-guess the present RAE outcome,
*

*> >nor merely to ape existing funding levels. We need a rich multiple
*

*> >regression equation, with many weighted predictors, not just one
*

*> >redundant mirror image of existing funding!
*

*>
*

*> Well.... In fact 'existing funding' *may* actually be a good
*

*> predictor of whatever it is we want to predict (see [deleted]'s recent
*

*> posting)!
*

To repeat: The RAE itself is a predictor, in want of validation. Prior

funding correlates 0.98 with this predictor (in some fields, and is

hence virtually identical with it), but is itself in want of validation.

This high correlation with the actual RAE outcome is already rational

grounds for scrapping the time-wasting and expensive ritual that is the

present RAE, but it is certainly not grounds for scrapping other metrics

that can and should be weighted components in the metric equation that

replaces the current wasteful and redundant RAE. The metric predictors

can then be enriched, cross-tested, and calibrated. (It is my

understanding that RAE 2008 will consist of a double exercise: yet

another iteration of the current ergonomically profligate RAE ritual

plus a parallel metric exercise. I think they could safely scrap the

ritual already, but the parallel testing of a rich battery of actual

and potential metrics is an extremely good -- and economical -- idea.)

*> We can only test such hypotheses when we are clear what it
*

*> is we want to predict, and what we mean by 'accuracy' of prediction.
*

In the first instance, in the decision about whether or not to scrap the

expensive and inefficient current RAE ritual, it is sufficient to

predict the current RAE outcome with metrics.

In order to go on to test and strengthen the predictive power of

that battery of metrics, they need to be enriched and diversified,

internally validated and weighted against one another (and the prior

RAE), and externally validated against the kinds of measure I mentioned

(subjective peer evaluations, predictive power across time, perhaps

other outcome metrics etc.)

*> Even if we knew this, I'm not sure the right data is available. But
*

*> in the absence of such a proper investigation, let's not pretend that
*

*> the answer is obvious, as you seem to be doing.
*

The answer is obvious insofar as scrapping the prior RAE method is

concerned, given the strong correlations. The answer is also obvious

regarding the fact that multiple metrics are preferable to a single

one. Ways of strengthening the predictive power of objective measures

of research performance are practical and empirical matters we need to

be analysing und upgrading continuously.

Stevan Harnad

Received on Tue Mar 28 2006 - 12:26:16 BST

Date: Tue, 28 Mar 2006 12:19:27 +0100

Date: Tue, 28 Mar 2006 08:13:32 -0500

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>

To: ASIS&T Special Interest Group on Metrics <SIGMETRICS_at_LISTSERV.UTK.EDU>

Subject: Re: Future UK RAEs to be Metrics-Based

The UK has a "dual" funding system: (1) conventional direct research

grant applications, with peer review of competitive proposals (RCUK) and

(2) top-sliced funding accorded to departments (not individuals) based on

past departmental research performance (RAE). The RAE was a monstrously

expensive and time-consuming exercise, with paper collection and

submission of all kinds of performance markers, including 4 full-text

papers, for peer-re-review by RAE panels. It turned out that the RAE's

outcome -- each departmental RAE "rank" from 1 to 5*, with top-sliced

funding given according to the rank and number of researchers submitted

-- was highly correlated with total citation counts for the department's

submitted researchers (r = .7 to .9+) and even more highly correlated

with prior RCUK funding (.98).

So RAE rank correlates highly with prior RCUK (and European) funding and

almost as highly with citations (and with other metrics, such as number

of doctorates accorded, etc.). The RAE rank is based on the data received

and evaluated by the panel -- not through multiple regression, but

through some sort of subjective weighting, including a "peer-re-review"

of already published, already peer-reviewed articles (although I very

much doubt many of them are actually read, the panels not being specific

experts in their subject matter as the original journal peer-reviewers

were meant to be -- it is far more likely that their ranking of the

articles is based on the reputation of the journal in which they were

published, and there is definitely pressure in the departments to

preferentially submit articles that have been published in high-quality,

high-impact journals).

So what is counted explicitly is prior funding, doctorates, and a few

other explicit measures; in addition, there is the "peer-re-review" --

whatever that amounts to -- which is no doubt *implicitly* influenced by

journal reputations and impact factors. However, neither journal impact

factors nor article/author citations are actually counted *explicitly* --

indeed it is explicitly forbidden to count citations for the RAE. That

makes the high correlation of the RAE outcome with citation counts all

the more remarkable -- and less remarkable than the even higher

correlation with prior funding, which *is* counted explicitly.

The multiple regression ("metric") method is not yet in use at all. It

will now be tried out, in parallel with the next RAE (2008), which will

be conducted in the usual way, but doing the metrics alongside.

Prior funding counts are no doubt causal in the present RAE outcome

(since they are explicitly counted), but that is not the same as saying

that research funding is causal in generating research performance

quality! (Funding is no doubt causal in being a necessary precondition

for research quality, because without funding one cannot do research, but

to what extent prior funding levels in and of themselves are causes of

research quality variance over and above being a Matthew Effect or

self-fulfilling prophecy is an empirical question about how good a

predictor individual research-proposal peer-review is for allotting

departmental top-sliced finding to reward and foster research

performance.

Hence the causality question is in a sense a question about the causal

efficacy of UK's dual funding system itself, and the relative

independence of its two components. For if they are indeed measuring and

rewarding the very same thing, then RAE and the dual system may as well

be scrapped, and the individual RCUK proposal funding with the redirected

funds simply scaled up proportionately .

I am not at all convinced that the dual system itself should be scrapped,

however; just that the present costly and wasteful implementation of the

RAE component should be replaced by metrics. And those metrics should

certainly not be restricted to prior funding, even though it was so

highly correlated with RAE ranking. It should be enriched by many other

metric variables in a regression equation, composed and calibrated

according to each discipline's peculiar profile as well as its internal

and external validation results. And let us supplement conservative

metrics with the many richer and more diverse ones that will be afforded

by an online, open-access full-text corpus, citation-interlinked, tagged,

and usage-monitored.

Stevan Harnad

On 28-Mar-06, at 6:39 AM, Loet Leydesdorff wrote

Date: Tue, 28 Mar 2006 12:19:27 +0100 (BST)

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>

To: SIGMETRICS_at_LISTSERV.UTK.EDU, oaci-working-group_at_mailhost.soros.org

Subject: Re: Future UK RAEs to be Metrics-Based

This an anonymised exchange from a non-public list concerning

scientometrics and the future of the UK Research Assessment Exercise.

I think it has important general scientometric implications.

By way of context: The RAE was an expensive, time-consuming

submission/peer-re-evaluation exercise, performed every 4 years. It turned

out a few simple metrics were highly correlated with its outcome. So it

was proposed to scrap the expensive method in favour of just using

the metrics. -- SH

---------- Forwarded message ----------

On Tue, 28 Mar 2006, [identity deleted] wrote:

that's fine. In multiple regression it is not necessary that each

predictor variable be orthogonal; they need only predict a significant

portion of the residual variance in the target (or "criterion") after

the correlated portion has been partialled out. If you are trying to

predict university performance and you have maths marks, english marks

and letters of recommendation (quantified), it is not necessary, indeed

not even desirable, that the correlation among the three predictors

should be zero. That they are correlated shows that they are partially

measuring the same thing. What is needed is that the three jointly, in a

multilinear equation, should predict university performance better than

any one of them alone. Their respective contributions to the variance

can then be given a weight.

The analogy is vectors, a linear combination of several of which may

yield another, target vector. It need not be a linear combination of

orthogonal vectors, just linearly independent ones.

Three other points:

(1) RCUK ranking itself is just a predictor, not the criterion that is

being predicted and against which the predictor(s) need to be validated.

The criterion is research performance/quality. Only metrics with face

validity can be taken to be identical with the criterion, as opposed to

mere predictors of it, and the RAE outcome is certainly not face-valid.

(2) Given (1), it follows that the *extremely* high correlation between

prior funding and RAE rank (0.98 was mentioned) is *not* a desirable

thing. The predictive power of the RAE ranking needs to be increased, by

adding more (semi-independent but not necessarily orthogonal) predictor

metrics to a regression equation (such as funding, citations, downloads,

co-citations, completed PhDs, and many other potential metrics that will

emerge from an Open Access database and digital performance record-keeping

CVs, customised for each discipline) rather than being replaced by a

single one-dimensional predictor metric (prior funding) that happens to

co-vary almost identically with the prior RAE outcome in many disciplines.

(3) Validating predictor metrics against the target criterion is

notoriously difficult when the criterion itself has no direct

face-valid measure. (An example is the problem of validating IQ tests.)

The solution is partly internal validation (validating multiple

predictor metrics against one another) and partly calibration, which

is the adjustment of the weight and number of the predictor metrics

according to corrective feedback from their outcome: In the case of the

RAE multiple regression equation, this could be done partly on the basis

of the 4-year predictive power of metrics against their own later values,

and partly against subjective peer rankings of departmental performance

and quality as well as peer satisfaction ratings for the RAE outcomes

themselves. (There may well be other validating methods.)

It is conceivable but unlikely that all research performance predictor

metrics turn out to be measuring the same thing, and that none of them

contributes a separate independent component to the variance of the

outcome; but I rather doubt it. At the risk of arousing other prejudices,

I would make an analogy with psychometrics: Test of cognitive

performance capacity (formerly called "IQ" tests) (maths, spatial,

verbal, motor, musical, reasoning, etc.) are constructed and validated

by devising test items and testing them first for reliability (i.e.,

how well they correlate with themselves on repeated administration)

and then cross-correlation and external validation. The (empirical)

result has been the emergence of one general or "G" factor for which

the weight or "load" of some tests is greater than others, so that no

single test measures it exactly, and hence a multiple regression battery,

with each test weighted according to the amount of variance it accounts

for, is preferable to relying on just a single test. And the outcome

is that there turns out to be the one large underlying G factor, with

a component in every one of the tests, plus a constellation of special

factors, associated with special abilities supplementing the G factor,

each adding a smaller but significant component to the variance too,

but varying by individual and field in their predictive power.

The controversy has been about whether the fact that the tests are

validated on the basis of positive correlations among the items is the

artifactual source of the positive manifold underlying G. I am not

a statistician or a psychometrician, but I think the more competent,

objective verdict (the one not driven by a-priori ideological views)

has been that G is *not* an artifact of the selection for positive

correlations, but a genuine empirical finding about a single general

(indeed biological) factor underlying intelligence.

I am not saying there will be a "G" underlying research performance!

Just that the multilinear (and indeed nonlinear) regression method can

be used to tease out the variance and the predictivity from a rich and

diverse set of intercorrelated predictor metrics. (It can also sort

out the duds, that are either redundant or predict nothing of interest

at all.)

They measure the predictor variable and try to predict the criterion

variable. As such, they are meant to provide an objective (but

validated) basis for evaluation.

To repeat: The RAE itself is a predictor, in want of validation. Prior

funding correlates 0.98 with this predictor (in some fields, and is

hence virtually identical with it), but is itself in want of validation.

This high correlation with the actual RAE outcome is already rational

grounds for scrapping the time-wasting and expensive ritual that is the

present RAE, but it is certainly not grounds for scrapping other metrics

that can and should be weighted components in the metric equation that

replaces the current wasteful and redundant RAE. The metric predictors

can then be enriched, cross-tested, and calibrated. (It is my

understanding that RAE 2008 will consist of a double exercise: yet

another iteration of the current ergonomically profligate RAE ritual

plus a parallel metric exercise. I think they could safely scrap the

ritual already, but the parallel testing of a rich battery of actual

and potential metrics is an extremely good -- and economical -- idea.)

In the first instance, in the decision about whether or not to scrap the

expensive and inefficient current RAE ritual, it is sufficient to

predict the current RAE outcome with metrics.

In order to go on to test and strengthen the predictive power of

that battery of metrics, they need to be enriched and diversified,

internally validated and weighted against one another (and the prior

RAE), and externally validated against the kinds of measure I mentioned

(subjective peer evaluations, predictive power across time, perhaps

other outcome metrics etc.)

The answer is obvious insofar as scrapping the prior RAE method is

concerned, given the strong correlations. The answer is also obvious

regarding the fact that multiple metrics are preferable to a single

one. Ways of strengthening the predictive power of objective measures

of research performance are practical and empirical matters we need to

be analysing und upgrading continuously.

Stevan Harnad

Received on Tue Mar 28 2006 - 12:26:16 BST

*
This archive was generated by hypermail 2.3.0
: Fri Dec 10 2010 - 19:48:17 GMT
*