From: Stevan Harnad <>
Date: Tue, 21 Jun 2005 16:19:11 +0100

> Okay, maybe you didn't say literally that nobody subscribes to all
> journals; I understood you as saying that lots of institutions don't
> subscribe to lots of journals.

The distinction is critical, and I chose my words (and meanings) quite
consciously: The relevant question is about *articles*, and the size of
the current potential readership/usership to which they are currently
inaccessible because their institutions can't afford access to the
journal in which they happen to be published.

> My point is that it is not possible to infer from this data that
> *every single paper* is inaccessible to many/most of its potential
> readers.

Although the inference does sound rather shocking and is probably
stronger than necessary, I think it *is*
possible to infer from the existing data that every single one of the
annual 2.5 million articles is inaccessible to *many* of its potential
users. (For that to be true, all you need is a few institutional
nonsubscribers with several relevant researchers each.) That is why I chose my
words as I did: What I said was "inaccessible to many or even most". (For
clarification, perhaps I should say "inaccessible to many, perhaps even most.")

Whether it is many or most will be indirectly revealed by our citation
data. We know that *most* published articles (c. 60%) are not cited at all,
and only 10% are cited more than 5 times. (There is substantial
self-archiving in every citation-bracket, though more for the higher-cited

We also know that self-archiving increases citations from 50-300+%

We also know that downloads correlate significantly with -- hence
predict -- citations:

We also know that the number of readings per article averages under
and sometimes well under) 1000, which means that -- if we take the
(conservative) upper limit of 5 citations per article -- 200 readings
generate one citation (no doubt varying by field: in astrophysics Michael
Kurtz reported a 17/1 reads/cites ratio.)

Although it is not yet possible to make direct comparisons between
download counts for OA vs. non-OA articles in the same journal issue
(as it is already possible to do with citations), unilateral download
counts for OA articles (along with logic), corroborated by the observed
download/citation correlation, suggest that OA substantially increases
downloads too (by at least the amount implied by the download to citation
ratio -- 200/1 to 17/1, take your pick) plus the a-posteriori evidence
that OA increases citations by 50-300%.

Kurtz (2004; "Restrictive access policies cut readership
of electronic research journal articles by a factor of two" ) estimates that open access
triples the number of downloads per article.

In short, I think we can safely infer that self-archiving increases
accessibility substantially. If it adds from 0.5 to over 3 citations per
article when most articles receive 0.0 citations, this does imply that
articles are today missing many (perhaps even most) of their potential
users if they are not being self-archived.

> To see that, let's imagine an extremely obscure topic, with no
> connection to anything else, that is only studied in one place in the
> world. A journal on that topic that is subscribed to by that one
> institution would achieve 100% coverage!

Conceded. Now, how representative do you think that kind of hypothetical
special case is, for the 2.5 million articles published annually? And
please don't interpret the 60% of articles that receive zero citations
as prima facie evidence for your hypothesis that no one is interested in
them, as they are just as readily (and more optimistically!) interpretable
in exactly the opposite way: that articles have been losing users and
citers because they were inaccessible rather than because no one was
interested in using and citing them.

> A specialist academic journal (and many of the world's journals are
> very specialized!) doesn't have to be on such an obscure topic for a
> similar affect to be relevant.

It is a foregone conclusion that peer-reviewed research journal articles
will never be best-sellers! But the question that the OA advantage data
are answering is whether they have been maximizing their usage and
impact until now. And the answer is that they have not: They have
substantially more potential impact than they have actually exhibited to
date. And the most parsimonious interpretation is that this is because
they have been substantially less accessible than one might have hoped.

> I think there are other very important factors at work and
> maybe they even dominate when it comes to the behaviour of individual
> researchers. For instance:
> 1. Publisher X doesn't allow Google and other search engines to index its
> journals. So people typing keywords into Google won't see articles
> in Publisher X journals, even if they have access to them. They will
> see articles in open archives.
> 2. Publishers try to make their websites user-friendly but Google
> etc. are just so good that getting access to a paper via [a publisher's
> or aggregator's website ] or whatever can be more work than typing stuff
> into Google, especially since each publisher lays out its website
> differently.
> 3. Of course, nobody physically goes to libraries anymore.

Excellent points, and they will need to be tested by comparing the
OA impact advantage for (1) toll-access journals that do and do not
have full-text indexing by google and that (2) do and do not have
online versions (though virtually all journals now do).

> Here's another factor, but I don't know how it affects anything:
> 4. I think many, perhaps most, citations are to papers that the
> authors haven't actually read, as background material. All an
> author needs to make a reference is an accurate citation that they
> can cut and paste, and maybe a skim through a few paragraphs from a
> preliminary version.

No doubt there is some of that (indeed there is some published evidence
for it, based on propagated typos, as you note), but how much? And did
unread citations not occur in on-paper days too? It will take a much more
sophisticated kind of text-analysis to partition citations into read and
unread ones, and then to compare the size of the OA advantage for each.
I suspect 100% OA self-archiving will have prevailed before we can do
that; indeed we probably need the full text corpus as a database to do
that sort of analysis thoroughly in the first place!

Stevan Harnad

