UK scholarly journals: An evidence-based analysis (by RIN/EPS) from Stevan Harnad on 2006-10-10 (American-Scientist-Open-Access-Forum)

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Tue, 10 Oct 2006 01:06:30 +0100

    Critique of:
    UK scholarly journals: 2006 baseline report
    http://www.rin.ac.uk/data-scholarly-journals

    A hyperlinked version of this critique is available at:
    http://openaccess.eprints.org/index.php?/archives/142-guid.html

    ---------------------
    SUMMARY: The above Report on UK Scholarly Journals was commissioned by
    RIN, RCUK and DTI, and conducted by ELS, but its questions, answers
    and interpretations are clearly far more concerned with the interests
    of the publishing lobby than with those of the research community.
    The Report's two relevant overall findings are correct and stated
    very fairly in their summary form:

        [1] "Overall, [self-archiving] of articles in open access
        repositories seems to be associated with both a larger number of
        citations, and earlier citations for the items deposited....The
        reasons for this [association] have not been clearly established -
        there are many factors that influence citation rates... Consistent
        longitudinal data over a period of years... would fill this gap."

        [2] "There is no evidence as yet to demonstrate any relationship
        (or lack of relationship) between subscription cancellations and
        repositories... Proving or disproving a [causal] link between
        availability in self-archived repositories and cancellations
        will be difficult without long and rigorous research."

    The obvious empirical and practical conclusion to draw from the
    finding that (1) all the self-archiving evidence to date is positive
    for research and that (2) none of the self-archiving evidence to
    date is negative for publishing) would have been that the research
    community should now apply and extend these findings -- by applying
    and extending self-archiving (through self-archiving mandates) to all
    UK research output, along with consistent, rigorous longtitudinal
    studies over a period of years, to test (1) whether the positive
    effect on citations continues to be present (and why) and (2)
    whether the negative effect on subscriptions continues to be absent.

    But instead, the two overall findings are hedged with volumes of
    special pleading, based mostly on wishful thinking, to the effect that
    (1') the observed relationship between self-archiving and citations
    may not be causal, and that (2') there may exist an as-yet-unobserved
    causal relationship between self-archiving and cancellations after all.

    Even that would be alright, if this Report's conclusions were
    coupled with a clear endorsement of the proposed self-archiving
    mandates, so that the competing hypotheses can be put to a rigorous
    long-term test. But the only test the commissioners of this Report
    seem to be interested in conducting is "Open Option" publishing,
    i.e., authors paying publishers to make their article OA for them,
    instead of self-archiving it for themselves. This would certainly
    be a nice way to hold author self-archiving and institution/funder
    self-archiving mandates at bay for a few years more, while at the
    same time protecting publishers from undemonstrated risk of revenue
    loss. But it would also leave global unmandated self-archiving to
    continue to languish at the current spontaneous 15% rate that the
    self-archiving mandates had been meant to drive up to 100%. And it
    would leave research unprotected from its demonstrated risk of impact
    loss. The option of having to pay to provide OA is certainly not
    likely to enhance the unmandated rate of uptake by authors (though
    I'm sure publishers would have no quarrel with funder mandates to
    provide OA coupled with the funds to pay publishers' asking price
    for paid OA, as provided by the Wellcome Trust).

    The longterm test will nevertheless be conducted, because four out of
    eight UK Research Councils have already mandated self-archiving. Their
    citation rates and their cancellation rates can then be compared with
    those for the four that have not mandated self-archiving (and whose
    authors hence do it spontaneously by "self-selection"). Alas this
    will be mostly comparing apples and oranges (e.g. MRC vs AHRC), and
    it will needlessly be depriving the oranges of several more years of
    potential growth enhancement. My guess is that all the other councils
    -- except possibly the paradoxical EPSRC (which evidently thinks,
    with the publishing lobby, that there's still some sort of pertinent
    pretesting to be done for a few more years here) -- will come to
    their senses long before that, unpersuaded by Reports like this one.

    ----------------------------------------------------------------------
        UK scholarly journals: 2006 baseline report
        An evidence-based analysis of data concerning
        scholarly journal publishing.
        http://www.rin.ac.uk/data-scholarly-journals

        Prepared on behalf of the Research Information Network,
        Research Councils UK and the Department of Trade & Industry
        By Electronic Publishing Services Ltd http://www.epsltd.com
        In association with Professor Charles Oppenheim and LISU at
        Loughborough University Department of Information Science

This is a rather long and repetitious report, but it does contain
a few nuggets. It is obviously biassed, but biassed in a restrained
way, meaning it does not really try to conceal its biases, nor does it
overstate biassed conclusions. It also (reluctantly, but in most cases
candidly) acknowledges its own weaknesses.

(The Report was commissioned by RIN, RCUK and DTI, but it is glaringly
obvious that the questions, answers and interpretations have been slanted
toward the interests of the publishing lobby rather than those of the
research community -- possibly because the research community has no
lobby in this matter, apart from the OA movement itself! Nevertheless,
there has been considerable circumspectness, at least in the summary
and conclusion passages, with weak points and gaps usually pointed
out explicitly rather than denied or concealed, and with the overall
preoccupation with publishing interests rather than research interests
very open too.)

Some quotes and comments:

> Whilst some evidence does suggest that [self-archiving in] repositories
> [is] an important new factor in the journal cancellation decision
> process, and one which is growing in significance, there is no
> research reporting actual or even intended journal subscription
> cancellation as a consequence of the growth of OA self-archived
> repositories.

So far, this sounds fair and reasonable. (In fact, this is the gist of
the Report! The rest is mostly special pleading.)

> Subscriptions are reported to have been declining over a period
> of 10+ years, but for a number of reasons. Proving or disproving
> a link between availability in self-archived repositories and
> cancellations will be difficult without long and rigorous
> research. In this connection, the outcome of research recently
> announced by the Research Councils UK (RCUK) with the
> co-operation of Macmillan, Blackwell and Elsevier, will be
> eagerly awaited, even though a report is not due until late
> 2008.

With evidence of self-archiving's benefits to research mounting, and
zero evidence yet of any negative effect at all on publisher revenue,
publishers nevertheless seem quite willing to wait (and keep research
waiting too), trying to fend off self-archiving and its potential
benefits to research for a long time to come yet, in order to keep
trying to find some evidence of negative causal effects on publisher
revenue (or, failing that, to deny positive causal effects on research
impact).

Note that whereas a link between OA self-archiving and subscription
decline has not yet been "proved or disproved" (not for want of
looking!) -- and it is for that reason that we are hearing these calls
for "long and rigorous research" -- the vast preponderance of the evidence
we *do* have has already "proved" a "link" between OA self-archiving
and citation counts (a link that is almost certainly causal, despite the
wishful thinking of some who have a vested interest in its all turning
out to be merely a-causal self-selection and superstition on the part
of authors).

The question that the research community accordingly needs to ask itself
is whether self-archiving's evidence-based benefits to research should
be held in abeyance still longer, and meanwhile interpreted by default
as a-causal, in order to buy still more time to try to "prove/disprove"
hypothetical subscription declines for which there is no evidence
whatsoever to date, even in fields where self-archiving has been near 100%
for years.

(Researchers should also go on to ask themselves whether the research
benefits should be held in abeyance even if they *are* causally linked
to a subscription decline: Is research impact to be sacrificed in the
service of publisher revenue? Are we conducting and funding research in
order to generate -- or to safeguard -- publisher revenue?)

> There is no evidence as yet to demonstrate any relationship (or
> lack of relationship) between subscription cancellations and
> repositories. Work in this field would need sufficient,
> representative and balanced samples, and the collaboration of
> all stakeholders, including especially research institutions
> and publishers. Any such study will need to be maintained over
> a fairly extended period, with regular reports, since it seems
> likely that the position could change with time if the contents
> of self-archiving repositories become progressively more
> comprehensive.

This would be fine, if proposed as an extended research project to be
conducted *after* self-archiving mandates are in place, to analyze their
long-term effects on subscriptions.

But this would be an exceedingly self-serving suggestion on the part of
the publishing community (and a methodologically empty one) if meant
as a "pilot" study that must somehow be conducted *before* adopting
self-archiving mandates. (And it would be exceedingly self-defeating
of the research community to even consider accepting such a pre-emptive
suggestion as a precondition, before adopting self-archiving mandates.)

> There is some consistency in results that show more citations
> for articles self-archived in repositories as distinct from the
> same or similar articles available [only via journal] subscription
> (although there have also been a few contradictory results).
> Overall, deposit of articles in open access repositories seems
> to be associated with both a larger number of citations, and
> earlier citations for the items deposited.

This a fair summary -- except that immediately after stating it, this
"association" is about to be deconstructed (much as the "association"
between cigarette-smoking and lung cancer was deconstructed for years
and years by the tobacco industry, claiming that only correlation had
been demonstrated, and not causation). Read on:

> The reasons for this [association] have not been clearly established
> - there are many factors that influence citation rates, including
> the reputation of the author, the subject-matter of the article,
> the self-citation rate, and, of course, how important or
> influential the repository is in its own right. The little
> existing evidence suggests that a possible [sic] reason for increased
> citation counts is not that the materials were free, or that
> they appeared more rapidly, but that authors put their best
> work into OA format. This research was limited to one discipline,
> however [astronomy], and more extensive evidence is required to
> validate this finding.

This (important) study by Kurtz et al in astronomy, however,

    http://cfa-www.harvard.edu/~kurtz/IPM-abstract.html

is not what the vast majority of the evidence (no longer little!) shows:

    http://opcit.eprints.org/oacitation-biblio.html

Moreover, as noted, this a-causal interpretation -- only one of the
possible interpretations of the astronomy evidence -- also happens to be
the interpretation that the publishing community prefers for *all* the
self-archiving evidence, in all fields. The alternative interpretation is
that the relationship is causal: that the OA advantage is not merely an
arbitrary whim on the part of the better authors to make their work OA,
to no causal effect at all (why on earth would they be doing it at all
then?): They do it because making their work more accessible increases
its accessibility, uptake, downloads, usage, applications, citations,
impact -- exactly as the correlational evidence shows, without exception,
in field after field.

(NB: The only methodologically unexceptionable way to demonstrate
causation here, by the way, is to select a large enough random sample
of articles, divide them in half randomly, mandate half of them to be
self-archived and half not, and then compare their respective citation
counts after a few years. No one is likely to do quite *that* study --
any more than it was likely that a large random sample of people would be
divided in half randomly, with half mandated to smoke and half not! But
we are in the process of doing an approximation to that causal study,
by comparing the citation counts of articles in the IRs of the (few)
institutions that have already mandated self-archiving with the average
for other articles in the same journals/years in which those articles
appeared, but that have not been self-archived; we will also compare
the size of the OA advantage for mandated and comparable non-mandated
self-archiving. [We do not believe for a moment that these data are
necessary to demonstrate causation, as causation is a virtual certainty
anyway, but we are ready to play the game, in order to try to cut
short the absurd delay in doing the obvious: mandating self-archiving
universally.])

> Although quite a lot of evidence has been collected regarding
> the quantitative effect of OA on citation counts (whether in
> the form of OA journals or as self-archived articles), much of
> it is scattered, uses inconsistent methods and covers different
> subject areas.

Yet, despite this scatter, inconsistency and diversity, virtually all
of it keeps showing exactly the same consistent pattern: A citation
(and download) advantage for the OA articles. (No amount of special
pleading can make that stubborn pattern go away!)

> Consistent longitudinal data over a period of years to measure
> IF trends in a representative range of journals would fill this gap,

There is no gap! There is a growing body of studies, across all fields
and all journals, that keeps showing exactly the same thing: the OA
advantage (in article citations and article downloads: this is not about
journal impact factors, especially because comparing different journals
is comparing apples and oranges).

(There seems to be a confusion here between the existence of the
correlation itself, between self-archiving and citation count counts
-- this is found consistently, over and over -- and the question of
the causal relation, which will not be answered by longtitudinal data
(we have longtitudinal data already!) but by comparing mandated and
unmandated self-archiving: if they both show the OA advantage, then the
effect is causal and self-selection bias is a minor component.)

> e.g., studying a range of journals that were toll-access and went OA
> (or vice versa). In the short-term, more data in different disciplines
> measuring the impact on citation counts of articles in hybrid journals
> or articles that are available in both forms versus articles that
> are only available in one of the forms will improve the evidence base.

No, the question about the reality and causality of the OA advantage
will not be settled by OA journal vs. non-OA journal comparisons; that
can always be dismissed as comparing apples with oranges, and, failing
that, can always be attributed to self-selection bias (i.e., choosing to
publish one's better work in an OA journal)!

And if we wait for the uptake of hybrid Open Choice -- i.e., paying the
journal to self-archive the published PDF for you -- these "longtitudinal"
studies are likely to take till doomsday (and any positive outcome can
still be dismissed as self-selection bias in any case!).

What is needed is precisely the data already being gathered, on
huge samples, across all disciplines, comparing citation counts for
self-archived versus non-self-archived articles within the same journal
and year. The result has been a consistent, high OA Advantage (which
has elicited a lot of special pleading about causality).

So we will look at the mandated subset of the self-archived papers,
to try to show that the OA advantage is not (only, or mostly) a
self-selection effect (Quality Bias [QB]).

(There is undoubtedly a non-zero self-selection [QB] component in the
OA advantage, but there are many other components as well, including a
Quality Advantage [QA], an Early Access Advantage [EA], a Competitive
Advantage [CA, which will, like QB, vanish once all articles are OA],
and a Usage (Download) Advantage [UA]. At 100% OA, there will no longer
be any QB or CA (or Arxiv Advantage [AA]), but EA, QA and UA will still
be going strong. EA and UA components have already been confirmed by
the Kurtz study in astronomy. QA is implied by the repeated finding of a
positive correlation between citation count and the proportion of those
articles with that citation count that are OA. The mandate study will
try to show that this correlation is causal, i.e., QA, not QB.)

  Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA.
  http://eprints.ecs.soton.ac.uk/12085/

> The whole area of the relationship between citation counts and
> scholarly communication channels is confused because of problems
> associated with quality bias [QB] (e.g., if scholars tend to
> self-archive only their best work, as suggested by Kurtz et al.
> [in astronomy]; alternatively, it may be that only the best journals
> are OA). In other words, differences in citation counts and IFs may
> simply reflect the quality of the materials under study rather
> than having anything to do with the channel by which the material
> is made available.

First, the issue is article citation counts, not journal Impact Factors
(IFs).

Second, this is all special pleading. The biggest OA effects are based on
comparing articles within the same journal/year. The size of the
effect is indeed correlated with the quality of the article, because
no amount of accessibility will generate citations for bad articles,
whereas good articles benefit the most from a level playing field,
with all affordability/accessibility barriers removed: that is the
Quality Advantage [QA]. The idea that the Quality Advantage is merely a
Quality (Self-Selection) Bias [QB], i.e., that the advantage is merely
correlational, not causal, is of course a logical possibility, but it is
also highly improbable (and would imply that accessibility/affordability
barriers count for nothing in usage and citations, and that the better
work is being made OA by its authors for purely superstitious reasons,
because doing so has no effect at all!).

> Overall, we concur with Craig's introduction that "the problems
> with measuring and quantifying an Open Access advantage are
> significant. Articles cannot be OA and non-OA at the same time."

They need not be. It is sufficient if we take a large enough sample of
articles that are OA and non-OA from the same journals and years. Randomly
imposing the self-archiving would be the only way to equate them
completely (and our ongoing study on mandated self-archiving will
approximate this).

(The analysis by Craig, commissioned by Blackwell Publishing, has not,
so far as I know, been published.)

> "Further, the variation of citation counts between articles can
> be extremely high, so making controlled comparisons of OA vs.
> non-OA articles nigh on impossible" [Craig, Blackwell Publishing]]

(The way Analysis of Variance works is to compare variation between and
within putatively different populations, to determine the probability
that they are in reality the same population. The published comparisons
show that the OA/non-OA differences are highly significant, despite the
high variance.)

It would of course be absurd to try to compare citation counts for OA and
non-OA articles having the same citation counts. But we can compare OA and
non-OA *article* counts among articles having the same citation counts,
in the same journals -- and what we find is a strong positive correlation
between the citation count and the proportion of articles that are OA
(just as Lawrence reported in 2001, but not only in computer science,
but across all 12 disciplines studies so far, and with much bigger
sample sizes):

    Source 4.8: Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
    Cross-Disciplinary Comparison of the Growth of Open Access and How
    it Increases Research Citation Impact. IEEE Data Engineering Bulletin
    28(4) pp. 39-47. http://eprints.ecs.soton.ac.uk/11688/

Note that the appendix to the Report under discussion here, states, in
connection with the above study, which it cites:

> "Harnad is THE advocate of OA and, thus, whilst expert in the
> field, is inevitably biased."

http://www.rin.ac.uk/files/UK%20Scholarly%20Journals%202006%20Baseline%20Report%20-%20Appendix.pdf

There is a bit of irony in the fact that in connection with another of
the studies it cites:

    Source 4.9: Harnad, S, Brody, T, Oppenheim, C et al,
    Comparing the impact of open access versus non open access
    articles in the same journals, D-Lib Magazine, 10,(6), 2004,
    http://www.dlib.org/dlib/june04/harnad/06harnad.html

the appendix of the Report goes on to say:

> "Harnad is THE exponent of OA, but, thus, potentially less
> objective."

Ironic (or, shall we say, conflicted, since this Report aspires to be a
neutral one as between the interests of the research community and the
publisher community), because the sole named collaborator on the Report
is also a co-author of the above-cited study!

Let us agree that we all have views on the underlying issues, but that
reliable data speak for themselves, qua data, and our data (and those of
others) keep showing the same consistent OA Advantage. The disagreement is
only on the interpretation: whether or not the consistent correlations
are causal. And here, allegiances are tugging on both sides: Those
favouring causality tend to come from the research community, those
favouring a-causality tend to come from the publishing community. (Let
us hope that the data from mandated self-archiving will soon settle the
matter objectively.)

> "[since] any Open Access advantage appears to be partly [sic]
> dependent
> on self-selection, the more articles that are {self-}archived...
> you'd expect to see any Open Access advantage reduce." [Craig,
> Blackwell Publishing]

Note that Craig carefully says "partly" -- and that we agree that
self-selection is one of the many potential contributors to the OA
advantage.

We also agree, of course, that once 100% OA is reached, the OA citation
advantage -- in the form of an advantage of OA over concurrent non-OA
articles -- will be reduced: indeed it will vanish! With all articles
OA, there can no longer be either a Competitive Advantage [CA] or a
Self-Selection Advantage (Quality Bias, QB) of OA over (non-existent)
non-OA.

But the Quality Advantage [QA] will remain. (Higher quality articles
will be used and cited more than they would have been if they had not
been OA: this is not a competitive advantage but an absolute one.) And
the Early Advantage [EA] as well as the Usage (Download) Advantage [UA]
will remain too (as already shown by Kurtz's findings in Astronomy).

> "Authors self-archiving in the expectant belief that each and every
> paper they archive will receive an Open Access advantage of several
> hundred percent are going to be sorely disappointed." [Craig,
> Blackwell Publishing]

This too is correct, but who on earth thought that OA would guarantee
that all work would be used, whether or not it was any good? OA levels the
playing field so merit can rise to the top, unconstrained by accessibility
or affordability handicaps. But bad remains bad, and let's hope that
researchers will continue to avoid trying to build on weak or invalid
findings, whether or not they are OA.

The OA advantage is an *average* effect, not an automatic bonus for each
and every OA article; moreover, the OA advantage is highly correlated
with quality: The higher the quality, the higher the advantage. It is
this effect that is open to the a-causal interpretation that the Quality
Advantage [QA] is merely a Quality Bias [QB] (Self-Selection). But,
equally (and, in my view, far more plausibly) it is open to the causal
interpretation that OA causes wider usage and citation precisely because
it removes all accessibility/affordability constraints that are currently
limiting uptake and usage. That does not mean *everything* will be used
more, regardless of quality ("usefulness"): But it will allow users
(who are quite capable of exercising self-selection too!) to access and
use the better work, selectively.

In addition, since the distribution of citations is not gaussian -- a
small percentage of articles receives most of the citations and more than
half of articles receive no citations at all -- it is almost axiomatic
that the OA advantage will be strongest in the high-quality range.

    http://www.crsc.uqam.ca/lab/chawki/classement_citations.htm

> Finally, it is worth noting that all researchers in the field
> are agreed that if the vast majority of scholarly publications
> become available in OA form, no citation advantage to OA will
> be measurable.

It is a tautology that with 100% OA, the OA/NOA ratio is undefined!
But EA will still be directly measurable, and it will be possible to
infer UA and QA indirectly (UA by comparing downloads for articles of the
same age, before and after OA for the same articles, and QA by doing the
same with citations; the Kurtz study used such methods in Astronomy.) But
by that time (100% OA), not many people will still have any interest in
the a-causal hypothesis.

> Thus, what OA advantage there is will prove to be temporary if
> OA does become the standard mode of publication.

This, however, is simply incorrect. At 100% OA, the Competitive Advantage
(CA) will be gone; the Self-Selection Advantage (Quality Bias, QB)
will be gone; the method of comparing citation counts for OA and non-OA
articles within the same journal and year will be gone. So much is true
by definition.

But (as Kurtz has shown in Astronomy), the Early Advantage and the Usage
Advantage will still be there. And the Quality Advantage, will still be
there too; and that was what this was all about: Not just a horse-race
for who can make his articles OA first, so as to reap the competitive
advantage before 100% OA is reached (though that's not a bad idea!);
not a guarantee that, no matter how bad your work, you can increase your
citations by making them OA; but a guarantor that with access-barriers
removed, quality will have the best chance to have its full potential
impact, to the benefit of research productivity and progress itself,
as well as the authors, institutions and funders of the high quality
work.

(There is a bit of a [lurid] analogy here with saying that if only we can
get everyone to smoke, it will be clear that smoking has no differential
effects on human health! Perhaps the converse is a better way to look
at it: if only we could get everyone to stop smoking, smoking will no
longer have a differential effect on human health!)

(PS: OA is not a "mode of publication": OA *publication* is a mode of
publication. OA itself is a mode of access-provision, which can be done
in two ways, via OA publication or via OA self-archiving of non-OA
publications.)

> Self archived articles
>
> It is this area that has been most studied, with numerous key
> publications. Most of these are focussed on the citation advantage
> of self-archived articles rather than of OA journals. Craig,
> in an as yet unpublished review, provides an excellent overview
> of the evidence collected to date. Lawrence (Source 4.13) is
> significant because it was the first major paper that identified
> a citation advantage for OA self-archived articles, and it has
> been widely cited ever since. However, it was based on a too
> small-scale a study to support general conclusions. Harnad et
> al. (Source 4.9) provides a useful summary of the state of
> play of OA advantage studies, while Hajjem et al. (Source
> 4.8) is fairly typical of the many articles produced by Harnad
> claiming that self-archiving leads to higher citation counts.

Let us be clear: The many OA vs. non-OA studies, ours and everyone
else's, across more than a dozen different disciplines, many of them
based on large-scale samples, all show the very same consistent pattern
of positive correlation between OA and citation counts. Those are data,
and they are not under dispute. The only "claim" under dispute is that
that consistent correlation is causal...

> Antelman (Source 4.1) is arguably the most carefully constructed
> study of the question. Articles in four disciplines were
> evaluated, and in each case it was found that open access
> articles had greater citation counts than non-open access
> articles. http://eprints.rclis.org/archive/00002309/

One wonders why this particular small-scale study (of about 2000 articles
in 4 fields) was singled out, but in any event, it shows *exactly* the
same pattern as all the other studies (some of them based on hundreds
of thousands of articles instead of just a few thousand, in three times
as many fields).

> Eysenbach challenges the notion that OA "green" articles (i.e.,
> those in repositories) are more effective than OA "gold" (i.e.,
> those published in OA journals, such as those produced by Public
> Library of Science) in obtaining high citation counts. It is
> this part of his paper that produced a furious response from
> Harnad, much of it focused on particular details.

The issue was not about OA green (self-archived) articles producing
higher citation counts than OA gold (OA-journal)! No one had claimed
one form of OA was more effective than the other in generating the OA
Advantage before the Eysenbach study: It was Eysenbach who claimed to
have shown gold was more effective than green -- indeed that green was
only marginally effective at all!

And I think anyone reading the exchanges will see that all the fury is
on the Eysenbach side. All I do is point out (rather patiently) where
Eysenbach is overstating or misstating his case:

    PLoS, Pipe-Dreams and Peccadillos
http://biology.plosjournals.org/perlserv/?request=read-response&doi=10.1371/journal.pbio.0040176#top
    http://openaccess.eprints.org/index.php?/archives/87-guid.html
    http://openaccess.eprints.org/index.php?/archives/88-guid.html
    http://openaccess.eprints.org/index.php?/archives/89-guid.html
    http://openaccess.eprints.org/index.php?/archives/90-guid.html

Eysenbach's study does find the OA advantage, as many others before
it did. It certainly doesn't show that the gold OA advantage is bigger
than the green OA advantage, in general. It simply shows that for the
1500-article sample in the one journal tested, Proceedings of the National
Academy of Sciences (PNAS), a very high impact journal, both paid OA
(gold) and green OA (free) increased citation counts over non-OA, but
gold increased them more than green. That result is undisputed. Its
extrapolation to other journals is:

The likely explanation of the PNAS result is very simple: PNAS is not
a randomly chosen, representative journal: it is a very high-impact,
very high visibility, interdisciplinary journal, one of very few like it
(such as Nature and Science). Articles that pay for OA are immediately
accessible at PNAS's own high-visibility website -- a website that
probably has higher visibility than any single institution's IR today.
So PNAS articles freely accessible at PNAS's website get a bigger OA
advantage than PNAS articles made freely accessible by being self-archived
in the author's own IR.

The reason it definitely does not follow from this that gold OA is
bigger than green OA is very simple: Most journals are not PNAS, and
do not have the visibility or average impact of PNAS articles! Hence
Eysenbach's valid finding for one very high-impact journal simple does
not generalize to all, most, or even many journals. Hence it is not a
gold/green effect at all, but merely a very high-end special case.

Apart from the spurious gold/green advantage, Eysenbach did confirm, yet
again, (1) the OA advantage itself, and confirmed it (2) within a very
short time range. These are both very welcome results (but not
warranting to be touted, as they were, by both the author and by
PLoS, as either the first "solid evidence" of the OA advantage -- they
certainly were not that -- or a demonstration that gold OA generates more
citations than green OA: the very same method has to be tried on middle
and low-ranking journals too, before drawing that conclusion!). (Nor are
the PLoS/PNAS results any more exempt from the methodological possibility
of self-selection bias [QB] than any of the many prior demonstrations
of the OA advantage, as authors self-choose to pay PNAS for gold OA as
surely as they self-choose to self-archive for green OA!)

The fury on Eysenbach's part came from my pointing out that his and PLoS's
claim to primacy for demonstrating the OA advantage (and their claim of
having demonstrated a general gold-over-green advantage) was unfounded
(and might have been due to both PLoS's and Eysenbach's zeal to promote
publication in gold journals: Eysenbach is the editor of one too, but
not a high-end one like PNAS or PLoS): Eysenbach's was just the latest
in a long (and welcome) series of confirmations of the OA advantage
(beginning with Lawrence 2001), the prior ones having been based on
far larger samples of articles, journals and fields (and there was no
demonstration at all of a general gold over green advantage: just the
one non-representative, hence non-generalizable special case of PNAS).

> Both authors believe that OA produces a citation advantage, but
> Eysenbach has presented evidence that casts doubt on Harnad's notion
> that the "green" route is the preferred route to getting that
> increased
> impact.

Green may not be the preferred route to OA for editors of gold journals,
but it is certainly the preferred route for the vast majority of authors,
who either have no suitable gold journal to publish in, or lack the funds
(or the desire) to pay the journal to do what they can do for free for
themselves. The only case in which paid gold OA may bring even more
citations than free green OA (even though both increase citations)
is in the very highest quality journals, such as PNAS, today -- but
that high-end reasoning certainly does not generalise to most journals,
by definition. (And it will vanish completely when OA self-archiving is
mandated, and the harvested IR contents become the locus classicus to
access the literature for those whose institutions are not subscribed
to the journal in which a particular article appeared -- whether or not
it is a high-end journal.)

(There is also a conflation of the (less interesting) question of
(1) whether green or gold generates a *greater OA citation advantage*
[answer, for high-end journals like PNAS, gold does, but in general
there is no difference] with the (far more important) question of (2)
whether green or gold can generate *more OA* [answer: green can generate
far more OA, far more quickly and easily, not just because it does not
cost the author/institution anything, but because it can be mandated
without needing either to find the extra funds to pay for it or to
constrain the author's choice of which journal to publish in].

> However, despite the intuitive attractiveness of the hypothesis
> that OA will lead to increased citations because of easier
> availability, the one systematic study of the reasons for the
> increased citations - by Kurtz (Source 4.12) - showed that
> in the field of astronomy at least, the primary reason was not
> that the materials were free, or that they appeared more rapidly,
> but that authors put their best work into OA format, and this
> was the reason for increased citation counts.

Astronomy is an interesting but anomalous field: It differs from most
other fields in that:

(1) It consists of a small, closed circle of journals.

(2) Virtually all research-active astronomers (so I am told by the
author) have institutional access to all those journals.

(3) For a number of years now, that full institutional access has been
online access.

(4) So astronomy is effectively a 100% OA field.

(5) Hence the only room left for a directly measurable OA advantage in
astronomy is (5a) to self-archive the paper earlier (at the preprint
stage) [EA] or (5b) to self-archive it in Arxiv (which has evolved into a
common central port of call, so it generates more downloads and citations
-- mostly at the preprint stage, in astronomy).

(6) What Kurtz found, was that under these conditions, higher quality
(higher citation-count) papers were more likely to be self-archived.

(7) This might be a quality self-selection effect (QB) (or it might not),
but it is clearly occurring under very special conditions, in a 100%
OA field.

(8) Kurtz did make another, surprising finding, which has bearing on the
question of how much of a citation advantage remains once a field has
reached 100% OA.

(9) By counting citations for comparable articles before and after the
transition to 100% OA, Kurtz found that the citations per article had
actually gone *down* (slightly) rather than up, with 100% OA.

(10) But a little reflection suggests a likely explanation: This slight
drop is probably a shift in balance with a level playing field:

(11) With 100% OA (i.e., equal access to everything), authors don't cite
more articles, they cite more *selectively*, able now to focus on the
best, most relevant work, and not just on the work their institutions
can afford to access.

(12) Higher quality articles get more citations, but lower quality
articles of which there are far more (some perhaps previously cited by
default, because of accessibility constraints) are cited less.

(13) On balance, total citations are slightly down, on this level
playing field, in this special, small, closed-circle field (astronomy),
once it reaches 100% OA.

(14) It remains to be seen whether total and average citations go up or
down when other fields reach 100% OA.

(15) What Kurtz does report even in astronomy is that although total
citations are slightly down, downloads are doubled.

(16) Downloads are correlated with later citations, but perhaps at 100%
OA this is either no longer true, or true only for higher quality
articles.

> Similarly, more carefully conceived work on the
> impact of both OA journals and self-archiving on the quality
> of research communications, especially on the peer review system,
> will be required.

OA journals are peer-reviewed journals: What sort of impact are they
feared to have on peer review?

And why on earth would the self-archiving of peer-reviewed, published
postprints have any impact on the peer review system? The peers review
for free. (Could this be just a veiled repetition of the question about
the impact of self-archiving on journal revenues, yet again?)

> Recently, the results of a study undertaken by Ware for ALPSP,
> which were published in March 2006 (Source 1.16, in Area 1),
> have provided at least some initial data on the question of the
> possible linkage between the availability of self-archived
> articles in an OA repository and journal subscription cancellations
> by libraries...: availability of articles in repositories
> was cited as either a "very important" or an "important" possible
> factor in journal cancellation by 54 per cent of respondents,
> even though ranking fourth after (i) decline of faculty need,
> (ii) reduced usage, and (iii) price. When respondents were
> invited to think forward five years, availability in a repository
> was still fourth-ranking factor, but the relevant percentage
> had risen to 81. Whilst this is not evidence of actual or even
> intended cancellation as a consequence of the growth of OA
> self-archiving repositories, it strongly suggests that such
> repositories are an important new factor in the decision process,
> and growing in significance.

Summary: No evidence of cancellations, but speculations by librarians to
the effect that their currently fourth-ranking factor in cancellations
might possibly become more important in the next five years...

Sounds like sound grounds for fighting self-archiving mandates and trying
to deny research the benefit of maximized impact for yet another five
years -- if one's primary concern is the possible impact of mandated
self-archiving on publishers' revenue streams. But if one's primary
concern is the probable impact of mandated self-archiving on research
impact, this sort of far-fetched reasoning has surely earned the right
to be ignored by the research community as the self-serving interference
in research policy that it surely is.

Stevan Harnad
American Scientist Open Access Forum
http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
Received on Tue Oct 10 2006 - 02:55:45 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:32 GMT