Re: Does the arXiv lead to higher citations and reduced publisher downloads?

From: Stevan Harnad
Date: Wed, 22 Mar 2006

On Tue, 21 Mar 2006, Phil Davis wrote:

> The data that Kristin illustrates do not show causation, only
> association.

The data Phil illustrates likewise do not show causation, only

> What I am arguing, however, is that the likely (and primary)
> cause of the citation advantage is not increased access, but
> some quality differential, leading to better articles being
> deposited in the arXiv.

In other words, you are making a causal inference despite only
having data on correlation (association). Fair enough. Others are
make other causal inferences, likewise based on data on
correlation (association).

> In our manuscript, we argue that if OA-as-cause is present, its
> scope is severely limited to highly-cited articles. How can we
> say this?

What your data show is that the OA Advantage (which everyone
confirms) is stronger on the high-end, and this could either be
because people tend to self-archive high-end articles more (QB),
or because the OA Advantage is stronger on the high end (QA).
Either way it's a quality effect. One way it's a Quality Bias
(QB), the other way it's a Quality Advantage (QA). You think
it's mostly QB, I think it's mostly QA. The data are compatible
with both. More fine-tuned causal tests are needed to decide.

> If increased access was the cause of increased citations in our
> data, we should see a significant and positive correlation
> between fulltext article downloads from the arXiv and the
> number of citations an article receives.

The OAA pertains to whether or not an article is self-archived,
not to how often it is downloaded. But there is also a
correlation between download counts and (later) citation counts,
as well as a correlation between whether or not an article is
self-archived and its download counts. Three correlations. Still
no causation. And compatible with QB, QA or both.

Now it appears that the download/citation correlation for these
data is there toward the high-end, not the low. That too is
correlation. It means that the correlation between downloads and
citations is not a straight linear one; there may be a threshold
effect or an acceleration at the high end. Still just
correlation, not causation. And compatible with QB, QA or both.

> The rationale is that article repositories increases
> readership, some of which leads to increased citations (this is
> the argument that SPARC, Harnad, Suber and others use to
> justify the use of archives). Now please take a look at Figure
> 3 in our paper ( Notice
> that this positive association only applies to highly-cited
> articles (note: the inverse log of 2.5 is about 316 downloads).

To repeat: You have shown a high-end correlation. That is not a
demonstration that all or most of the cause is QB.

> In order to argue for causation one must be able to describe
> and measure the mechanism by which the cause takes place.
> Antelman (and others) demonstrate only the association between
> open access and citations, and infer that open access must be
> the cause. In our paper, we test the Open Access postulate,
> the Early View postulate, and a Quality Differential postulate.
> Of these three we feel that the Quality Differential is the
> strongest explanation for the data. We do not rule out Open
> Access completely, but the data do suggest that if access is
> responsible for increased citations, this effect may only take
> place for already highly-cited articles.

You test EA, QB and QA. You (unlike others, in other fields) find
no EA in maths. You do not and cannot differentiate QB from QA
with your data.

Stevan Harnad
Received on Thu Mar 23 2006

