Re: Central vs. Distributed Archives

From: Stevan Harnad <>
Date: Tue, 9 Sep 2003 03:18:52 +0100

On Mon, 8 Sep 2003, ?iso-8859-1?Q?Hugo_Fjelsted_Alr=F8e?= wrote:

> I think it is still too early to write off any of the possible paths to
> open access within the field of self-archiving (not that you do that). I
> see a potentially very fruitful role for community-building archives
> that focus on certain research areas. These could be facilitated or
> mandated by some of the specialized public research institutions that,
> together with universities and private companies, inhabit the research
> landscape. I think of research institutions oriented towards applied
> research within for instance environmental research, agriculture, public
> health, education, community development, etc. Here, there is a clear
> two-sided research communication: towards the public and towards other
> researchers in the field. Open access thus serves two communicative
> purposes, improving scholarly communication and improving public access
> to research results, besides the complementary purpose of institutional
> self-promotion.

All true. And certainly a national research centre like
France's CNRS or INSERM or INRA (where Helene Bosc is so active ) or Germany's Max-Planck Institutes
or Italy's CNR or NIH intramural research groups or even CERN's
distributed research community could each create (a kind of) central
archive consisting of its own research output. It is clear how an
institutional policy could mandate this, and how this would be in the
joint interests of the researchers and their institution -- whether a
university or a distributed national research centre. These national
research centres, after all, are the hosts of the research and the
sponsors of the research, sharing its costs and the credits.

But it is not clear to me how any other kind of central entity (apart
from a research funding agency) could mandate self-archiving: What would
be the shared carrots? And what would be the pertinent sticks? I
certainly can't imagine a Learned Society (other than a research funder
or a research publisher) being able to induce its members or
co-disciplinarians to self-archive in the way a university or national
research centre could induce its researchers to do so. (But maybe others
with better imaginations than mine can think of a credible causal scenario?)

> By "community-building", I mean that such archives can contribute to the
> creation or development of the identity of a scholarly community in
> research areas that go across the established disciplinary matrix of the
> university world.

It would be nice to see a new subdisciplinary or multidisciplinary field
consolidate its existence by self-archiving collectively. But wouldn't
founding their own journal or journals be the more likely way they would
go about it? Each researcher in the new sub- or multidisciplinary field
presumably has his own institution, hence potentially his own
institutional open-access archives, all linked by the glue of
OAI-interoperability. The new sub- or multidisciplinary name that unites
them simply amounts to another metadata tag in OAI subject-space. There
is no need for the papers to sit physically in the same place.

But if it is more likely that these researchers will self-archive if they
have the new tag as the banner, and a dedicated archive as the locus,
more power to them!

> I have myself initiated an archive in research in
> organic agriculture (, which we hope will become a
> centre for international communication and cooperation in this area.
> Scientific papers from research in organic agriculture are published in
> many different specialized disciplinary journals as well as in general
> scientific journals and journals focused at organic agriculture, and it
> is not easy for researchers to keep track of all that is being
> published.

As noted, a unique field-descriptor tag would unify all this distributed
work as surely as a dedicated archive would, but if there really is a
greater incentive to self-archive for the sake of the new subfield than
for the sake of the impact of the research of each researcher and his
institution, then this will prove to be an interesting historical fact for
those who write the history of the slow and belated rise of open-access,
as optimal and inevitable as it have might been!

> I know the same thing can in principle be done with OAI-compliant
> university archives and a "disciplinary hub" or "research area hub", and
> in ten years time, we may not be able to tell the difference. But today,
> it is still not quite the same thing.

I note that Organic Eprints with 581 records has
over twice as many records as the average archive (25,151
known records to date divided by 106 known archives = 237 records on
average) most of them institutional (though there are some much bigger
university archives, such as Lund's with
2143 records!). But alas both that number and its competitors are still
far too small to draw any strong conclusions! Over 2,000,000 refereed
journal articles are published annually, across 24,000 journals
representing all disciplines, after all.

> Contributing to the community
> would be detached from the usage of what is there, since the depositing
> of papers would take place somewhere outside the hub. This makes it
> dependent on the widespread existence of university archives. So if one
> wants to establish such an open-archive-based scholarly community hub,
> the way to do it is to make an eprint archive with the scope that one
> wants.

Or to introduce a unique field-descriptor metadata tag...

>sh> Having said that, it is still a historical fact that the first and
>sh> still-biggest open-access OAI archive is a central, discipline-based
>sh> one, the Physics Archive founded in 1991 But
>sh> Arxiv's growth rate has been steadily linear since 1991, and
>sh> shows no sign of either accelerating or generalizing to all the
>sh> other disciplines. So clearly something else was needed to hasten
>sh> the open-access era, and my own hunch is that a concerted policy
>sh> university-based archiving was what was needed.
> What's wrong with linear growth?

Nothing! It's infinitely better than sublinear growth, or no growth at
all! But the fact is that the physics Arxiv has been growing with a
slope of about 45 degrees nonstop since 1991, and today, 12 years later,
it is still many years short of capturing all the annual articles in
physics. (slide 25)
And most other fields are still far behind physics.

Whereas the fact is that 100% open access is already within easy
reach *today*, if all researchers would simply grasp it (in parallel!
not in a linear series!)!

(I'm hoping that a generalization of institutional "publish-or-perish"
policies to "publish with maximized impact" (hence open-access) together
with national research-funder policies mandating open access plus more and more
empirical demonstrations of the impact-enhancing power of open-access will at last
help researchers grasp what is in the best interests of themselves,
their research, their research institution, their research funders,
and the tax-paying public that supports it all.)

> It must be the SIZE of the growth rate
> that is important. And how long it will take to realize some satisfying
> level of open access with this growth rate.

The important thing is how much research output there is annually, and
how much longer the growth rate will take to approach 100%!

> When you are looking for
> exponential growth, I take it that you are looking for something that
> MIGHT turn out to have a higher maximum growth rate than, for instance,
> arXiv. And that is all well, but it might be exponential and still have
> a slower maximum growth than the linear growth we see in arXiv.

I am referring to the annual rate at which 100% is approached. Linearly,
and at 45 degrees would have been fine for physics if the scale had been
different (and the ceiling closer). Taking another 10 years would be
appalling even if the growth had been exponential! But all physics needs
is a good upward turn and it can hit 100% within the year. The other
disciplines are not much further: So near and yet so far!

> In the presentation that you refer to above, you write:
> "At that rate, it would still take a decade before we reach the first
> year that all physics papers for that year are openly accessible."
> I think that this is an impressive and very satisfying growth. And I
> don't think that a decade is too long - the great news is that physics
> is getting there!

A decade would perhaps have been alright if it had taken from inception
in 1991 till 2001. But we are still waiting! I am not sure where you are
drawing your intuitions about what is and is not satisfying growth!
Certainly one could count it satisfying if only a few other disciplines
even *started* the climb that physics has been doing since 1991. But the
fact is that there's no reason for such slow rates and ant-like queues! It
is within every researcher's power to do it overnight, in parallel. *That*
is the visible, reachable criterion against which I am weighing our
glacially slow progress so far. And soon I will produce figures
illustrating (graphically!) the proportion of its total potential
research impact that the planet is losing daily, weekly, and yearly
because of this needless paralysis!

("Satisfying" indeed! Harumph!)

Stevan Harnad

