Re: Does the arXiv lead to higher citations and reduced publisher dowloads?

From: Stevan Harnad <>
Date: Wed, 22 Mar 2006 20:14:48 EST

    [ The following text is in the "3Diso-8859-1" character set. ]
    [ Your display is set for the "iso-8859-1" character set. ]
    [ Some characters may be displayed incorrectly. ]

On Tue, 21 Mar 2006, Peter Banks wrote:

> [Re: Kristin Antelman's findings] I... suspect that there is a=20
> small OA citation advantage, I am not convinced by these=20
> data... I doubt that most of the results reach statistical=20
> significance...

Based on past postings from Peter, I think there may be an=20
element of wishful thinking here (ex officio)! Peter, if you are=20
not convinced by KA's data alone, look at all the other data that=20
shows the same thing. For example, see Figure 4 in:

     Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
     Cross-Disciplinary Comparison of the Growth of Open Access and How
     it Increases Research Citation Impact. IEEE Data Engineering Bulletin
     28(4) pp. 39-47.

You will see that the ratio of the proportion of OA articles to=20
non-OA articles peaks in the 4-7 citation range, and falls off=20
for higher and lower citation (quality) ranges. But it is always=20
greater than one (i.e., an OA Advantage) except for articles with=20
zero citations (where the ratio reverses); that of course is also=20
the largest number of articles.

But this effect is again just a correlation, and is just as=20
compatible with a Quality self-selection Bias (QB) as with a=20
Quality Advantage (QA) (except that it is hard to see why=20
self-selection QB should peak at the 4-7 range, whereas it's=20
perhaps less difficult to see how a QA advantage could have=20
inverted U-shape, absent for the duds and trivial for the gems --=20
but this awaits more confirmatory data and ways of testing=20
causality more directly.

> I also don't understand how these data exclude Phil's=20
> hypothesis. Since Kristin seems to define quality in terms of=20
> citations, then the logic seems self-referential: how would one=20
> detect a difference in citation due to intrinsic quality when=20
> one has defined quality as number of citations?

You're quite right, except that that argument cuts in both=20
directions: No data to date can decide directly between QA and=20

Stevan Harnad
American Scientist Open Access Forum

Chaire de recherche du Canada Professor of Cognitive Science
Ctr. de neuroscience de la cognition Dpt. Electronics & Computer Science
Universit=E9 du Qu=E9bec =E0 Montr=E9al University of Southampton
Montr=E9al, Qu=E9bec Highfield, Southampton
Canada H3C 3P8 SO17 1BJ United Kingdom
Received on Thu Mar 23 2006 - 02:01:29 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:16 GMT