Re: Increased citation of OA

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Wed, 2 Apr 2008 02:21:26 +0100

On Tue, 1 Apr 2008, Phil Davis wrote:

> [1] It is frankly amazing that Stevan can make such a qualified response
> about sample size, statistical power, and what we should measure without
> [2] having read the study. Should I be surprised that [3] his response is
> merely a response to take the front-self-promoting stage?
> BTW, [4] we do measure the effect of self-archiving (whoever put the
> article
> in a repository) on article downloads and citations.

(1) Phil himself gave us the bounds on his sample size: 11 physiological
journals, 11-14 months. The order of magnitude difference between that and
thousands of journals across a dozen disciplines, hundreds of thousands
of articles, and a 10-year span is transparent.

(2) It is rather hard to read a study when the study is not provided to
read.

(3) I like Phil's picturesque descriptor "front-self-promoting" for
my response to his posting. But then what would his sobriquet be for
publicly announcing -- not for the first time, but this time on the basis
of unpublished findings (not provided) -- conclusions to the effect
"that the 'citation advantage' so widely promoted in the literature is
an artifact of other explanatory variables"? (And is that public
announcement meant to be greeted with just stunned silence?)

(4) The point was about comparing imposed vs. self-selected self-archiving
on the same sample of 11 journals and the same 11-14-month time-span. If
there is no self-selected self-archiving effect in the same sample,
then the absence of an imposed self-archiving effect is meaningless:
It's the sound of one hand clapping.

If I have misunderstood what you posted, Phil, I'd be happy to hear/see
more.

Stevan

Davis, P. M. and Fromerth, M. J. (2007) Does the arXiv lead to higher
citations and reduced publisher downloads for mathematics articles?
Scientometics, accepted for publication.
http://arxiv.org/abs/cs.DL/0603056

Responses (Sigmetrics list, March 2006ff)
http://listserv.utk.edu/cgi-bin/wa?A2=ind0603&L=sigmetrics&D=1&O=D&P=11683

> Stevan Harnad wrote:
> > On Tue, 1 Apr 2008, Philip Davis wrote:
> >
> > > We've been conducting a randomized controlled trial of open access
> > > publishing with 7 publishers in the multidisciplinary sciences,
> > > biology,
> > > medicine, social sciences, and humanities since January 2007.
> > >
> > > The type of methodology we're using (randomized controlled trial) is
> > > key
> > > here since previous observational studies simply assume that
> > > author-sponsored OA articles are qualitatively similar to
> > > subscription-based
> > > articles.
> >
> > Most prior studies simply compared articles within the same journals and
> > years that were and were not made OA by being self-archived by their
> > authors.
> >
> > The ideal study would be one that randomly *imposed* self-archiving on
> > articles from within the same journals and years, and compared it with
> > unimposed self-archiving for the same journals and years. This
> > forthcoming
> > study seems to do only half of this.
> >
> > A potential problem with assessing the effects of self-archiving
> > on citations is, of course, the "self": Authors self-select to
> > self-archive (some authors -- c. 15% -- do it, most don't), and authors
> > can also self-select which of their papers they self-archive. Hence this
> > leaves open the possibility that self-archived papers (and authors)
> > are self-selected to be the better ones. And then the question is:
> > What proportion of the enhanced citations of self-archived papers occurs
> > because of OA and what percentage is because of self-selection?
> >
> > A study that imposes the OA self-archiving randomly could help answer
> > this question.
> >
> > But a potential problem of this forthcoming study is time-scale and
> > sample-size.
> >
> > The published findings on the higher citations for OA self-archived
> > articles (e.g. Hajjem et al 2005) are based on hundreds of thousands of
> > articles, in thousands of journals, across a number of fields, across
> > a number of years. The effects are always the weakest in the first year
> > or two after publication (depending on field), before the citations have
> > had a chance to grow.
> >
> > During that early period, it is downloads rather than citations
> > that reflect the OA advantage -- and downloads have been shown to be
> > correlated with, and predictive of, later citations (Brody et al 2006):
> >
> > > Preliminary results from 11 journals published by the American
> > > Physiological
> > > Society indicate an increase in article downloads, although many of
> > > these
> > > downloads are attributable to indexing robots. The articles are
> > > currently
> > > between 11 and 14 months old and we see no citation advantage. In
> > > fact, the
> > > randomly selected OA articles received slightly fewer citations,
> > > although
> > > this result is non-significant.
> > >
> > > Our paper is currently in review and should be made public shortly.
> >
> > This profile (i.e., no difference) is perfectly compatible with the
> > conclusion that the sample was too small and the time-span was too
> > short to have picked up any effects at all. It is comparing apples and
> > oranges unless there is a control group, in the same journal sample and
> > year-span, consisting of self-selected, self-archived articles that *do*
> > show the citation increase whose causes are here being tested.
> >
> > If an equal-sized sample of self-selected, self-archived articles from
> > the same 11 journals, over the same 11-14 months, *did* show the
> > citation
> > increase, whereas the control sample with the self-archiving imposed did
> > not, then we could make the inference that it is the self-selection that
> > causes the citation increase.
> >
> > But with a small sample and a small time-span, and no difference, the
> > most likely outcome is that neither group would yet show any citation
> > advantage.
> >
> > (Some comparisons might possibly be made with the Eysenbach (2006)
> > study, which was also based on a small sample sample -- a single very
> > high-profile journal (PNAS) and about 1500 articles -- and a small
> > time span. The OA/non-OA citation difference was found surprisingly
> > early. There were two kinds of "self-archiving": most were done by
> > PNAS on the (paying) authors' behalf, on the PNAS website; the other
> > kind was done by (nonpaying) authors, on their own websites (or IRs).
> > The
> > lion's share of the early OA citation advantage was for the articles
> > made OA on the PNAS site. But of course both kinds of OA self-archiving
> > here were self-selected, rather than imposed. And the fact that the OA
> > advantage was much bigger for the articles "self-archived" on the PNAS
> > site suggests that the big early effect may have had something to do
> > with being freely accessible at the much-consulted websites of one of
> > the highest-citation journals of all.)
> >
> > > We conclude that the 'citation advantage' so widely promoted in the
> > > literature is an artifact of other explanatory variables.
> >
> > These are rather big conclusions to draw from what seems to be a rather
> > small study (that does not seem to control for the most important
> > explanatory variable of all, which is unimposed self-selection, in the
> > same sample and time-interval)!
> >
> > We are currently conducting a somewhat bigger study, comparing the size
> > of the citation difference between self-archived and non-self-archived
> > articles within the same journals and years for the four earliest of the
> > institutions that mandate self-archiving. A mandate is not a guarantor
> > that all articles will be self-archived; and mandates have not been
> > around for that long either; but the prediction would be that if the
> > self-archiving citation increase were all or mostly due to
> > self-selection,
> > then mandates should either reduce substantially, or eliminate the
> > OA/non-OA difference, compared to the unmandated OA/non-OA difference.
> >
> > Our study compares the size of the self-archived/non-self-archived
> > difference
> > separately for mandated and unmandated self-archiving.
> >
> > Stay tuned.
> >
> > Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics
> > as
> > Predictors of Later Citation Impact. Journal of the American
> > Association for
> > Information Science and Technology (JASIST) 57(8) pp. 1060-1072.
> > http://eprints.ecs.soton.ac.uk/10713/
> >
> > Eysenbach, G, (2006) Citation Advantage of Open Access Articles. PLoS
> > Biology
> > 4(5): e157 DOI: 10.1371/journal.pbio.0040157
> > http://dx.doi.org/10.1371/journal.pbio.0040157
> >
> > Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
> > Cross-Disciplinary
> > Comparison of the Growth of Open Access and How it Increases Research
> > Citation
> > Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.
> > http://eprints.ecs.soton.ac.uk/11688/
> >
> > Stevan Harnad
> >
> > > Philip Davis
> > > PhD student
> > > Cornell University, Dept. of Communication
>
Received on Wed Apr 02 2008 - 02:28:41 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:16 GMT