Re: Need for systematic scientometric analyses of open-access data

From: Stevan Harnad <>
Date: Sat, 21 Dec 2002 12:51:42 +0000

Thanks to colleagues Thomas Krichel (below) and Helene Bosc (previous
posting) for pointing out (delicately) that I was mistaken to take at
face value Ebs Hilf's cheerful suggestion that my own prior estimate
-- that so far there are only about 200 open-access peer-reviewed
journals (out of 20,000 toll-access peer-reviewed journals in all)
-- may have been too pessimistic!

Perhaps it was not too pessimistic. The Regensburg list (although a
splendid model for how such resources might in the future be organized)
is somewhat illusory. Some of it is not peer-reviewed journals, and many of
those that are listed as in some sense "free," are not open-access (which
means free, complete online access to the full-text).

But please recall the context of all this: There are two BOAI strategies
for achieving open access: BOAI-1 is the self-archiving of toll-access
publications by their authors, in their institutional Eprint Archives,
and BOAI-2 is the creation of new open-access journals (and the conversion
of existing toll-access journals to open access)

All BOAI proponents, including myself, are full supporters of both BOAI
strategies, which complement one another; but some of us devote our
personal efforts more to one strategy or the other. It is no secret that
my own efforts are devoted mostly to BOAI-1 (self-archiving), and I have
reasons for this: I believe the relation between the two strategies is
that self-archiving is immediately feasible, right now, and will prepare
the way for open-access journals, by first making the literature openly
accessible (thereby solving the urgent immediate-access problem) and
then eventually the 20,000 toll-access journals will convert to open
access by downsizing to become peer-review service providers instead of
journal-text providers.

This is merely a hypothesis, however, for although it correctly
describes what is possible and attainable immediately (and has
already been attained by the authors of millions of self-archived
papers: see
and ) it -- like BOAI-2 -- depends on
second-guessing human nature, which one can never do with assurance! Will
researchers choose to free their own toll-access research by self-archiving
it today? Will they choose to publish in the open-access journals that are
available? Will new open-access journals be created?

Now the immediate occasion for this discussion thread was the recent $9
million grant to the Public Library of Science for the founding of new
open-access journals (i.e., BOAI-2):

This is excellent news for open access -- and a good time to take stock
of the relative progress of BOAI-1 and BOAI-2 to date: What proportion
of the peer-reviewed research literature is currently being made openly
accessible through self-archiving (BOAI-1) and through open-access
journals (BOAI-2), and how quickly are the two complementary strategies

The immediate metric for comparison is the individual peer-reviewed journal
article. There are about 2 million of those published per year (although
that too is just a very vague guess) in the planet's 20,000 peer
reviewed journals (also a guess). About 200,000 physics papers have been
self-archived since 1991 (but there might possibly be some double-counting
there, because the same paper may appear as a pre-refereeing preprint
and also a peer-reviewed postprint). ResearchIndex has harvested about
500,000 computer science papers from the Web (but how many of them are
peer-reviewed final drafts?); OAIster lists over a million records (but
some of them are double-counted from these other sources, and again the
proportion of them that are peer-reviewed is not yet analyzed). There are
probably other archives, and certainly many more self-archived papers,
on personal websites, not yet harvested and tallied, in all disciplines.

The corresponding figures for BOAI-2 are also uncertain. It was here
that Ebs suggested I was being too pessimistic. I had estimated that
of the total 20,000 peer-reviewed journals (a guess) about 200 were
open-access journals (also a guess). Ebs suggested mine was a gross
under-estimate, and it was here that he cited the Regensburg data as
counterevidence. I think a closer analysis of the Regensburg data (and
other data from the Web) will indeed show that the number of open-access
journals is higher than 200, perhaps considerably higher. (There may
also be more than 20,000 peer-reviewed journals worldwide.) But not as
high as Ebs has suggested!

The systematic comparison will be subtle, but, I think, very
instructive. Not only do estimates have to sort out the dates of the
open-access articles -- so we can get an estimate of the amount of growth
across time, especially in the last 3 years -- but they will have to
be careful not to double-count the open-access journal articles,
erroneously crediting them to self-archiving. What is needed is a 3-year
time series, showing the growth of the number of self-archived
peer-reviewed articles and the number of articles published in
open-access journals -- comparing them to one another (with
subcomparisons by fields) as well as to the estimated total number of
peer-reviewed articles annually, so we can estimate how soon universal
open-access will be achieved (and what route will complete it first).

And (as noted by Helene, as well as myself) it will also be important to
ascertain the "level" at which the relative growth in open-access is
taking place. Estimates of the quality/impact level of both the
open-access journals and the self-archived articles will need to be
made, for whereas the Public Library of Science is explicitly aiming at
a top-down approach (capturing the highest-level research initially,
and allowing the effect to generalize downward as a result), some of
the initial spontaneous new and converted open-access journals may be
coming more from the lower, weaker levels of the current hierarchy
20,000-journal quality hierarchy (and such bottom-up effects may be
slower to generalize than top-down effects). It will also be interesting
to know the correlation between an article's quality/impact and the
probability that it is self-archived (although here we already
know that there is a post-hoc causal connection too -- for
"free online access substantially increases a paper's impact" ).

Stevan Harnad

On Fri, 20 Dec 2002, Thomas Krichel wrote:

>sh> The excellent (truly remarkable!) Regensburg resource Ebs cites below:
>sh> lists 759 Physics journals, of which 103 (14%) are open
>sh> access. (Is this complete?)
> The list is a remarkable piece of work. It is unfortunate that
> you seem to missread their data. When they award the green mark,
> it means that the journal comes "with freely available fulltext articles".
> It does not mean "open access".
> I checked this out for the Wirtschaftswoche, marked green for, a
> German Economics magazine and by no intents and purposes
> a scholarly journal. Some contents are short full texts,
> others are summaries of articles in the magazine, and
> some are short news items. But this is by no means
> the full contents of the magazine, I should think.
