Re: Central vs. Distributed Archives

From: Eberhard R. Hilf <hilf_at_PHYSNET.PHYSIK.UNI-OLDENBURG.DE>
Date: Wed, 10 Sep 2003 15:40:51 +0200

Dear Stevan and the list members,
here are some arguments for
1. All physicists will publish in the ArXiv not before the year 2050,
although the arxiv size is growing quadratically, not linearly with time.
Earlier estimates [St. Harnad,
slide 25 are to be revised].

2. Usage of repositories seem to be proportional to their size,
but independent of absolute size.
The full text you find at

   physicists will publish in the ArXiv not before the year 2050
Here are some more elaborate but rather audacious risky estimates
(P.Ginsparg would know better).

The ArXiv is unique in that it serves its own usage and submission logs.

At present (after 146 months of service) there are 246.555 documents
The monthly rate of incoming new documents are at present 3.500. It rises
linearly with time, see
Next month there will be 24 papers more per month handed in than this

This allows to integrate it to get an estimate, at which future time
virtually all physicists would send in their prime papers to the ArXiv.

Let us estimate the number of physicists worldwide to be 1.000.000
of which 10 %
might be active as researchers, producing, say 2 papers per year.
Then we have 200.000 prime physics papers per year.
Integrating this yields to see them all in ArXiv to be in 44 years and
six months from now, that is in the year 2050.

Clearly, by then we will have passed more technical revolutions, so that
steady state extrapolation is not likely to happen.

Other new developments may have a much steeper rise of spreading,
notably the selfarchiving by the authors, their institutes or
and their libraries forming a distributed net of repositories.

The advantage is its scalability, flexibility, the business model
(distributed funding by the institutions of the creators of the
the retaining of the author's rights, the update possibility,
and the acceptance spreading: to convince a large body such as a
learned community to set up a central service such as the ArXiv for
is much harder, then to convince a percentage of local distributed
and institutes (the multiple small versus one large barrier chance).

The challenges are to set up the needed international standards,
to allow intelligent search engines to serve the retrieval,
to stimulate the discussion and communication between the authors,
-known in the past of beeing very conservative but not considerate of
working habits, and not very colloquial about it, used that they are being
taken care of and that someone else pays..

At present, the ArXiv is still unique in serving unconditional time stamp,
and long term readability.

     Is the usage is proportional to the size of a repository?
Reachout to and satisfaction of users of a repository may be estimated by
the ratio of pageviews per month
divided by the number of documents,

This ratio is astonishingly similar for different respositories even
of widely different size, may they contain documents or links.

For Marenet with its 1.595 links it is 1.9
for MPIV with its 3.027 links it is 3.6
for Physnet with its 5.759 links it is 4.2
for VAB with its 2.655 links it is 10.4
for ArXiv with its 245.056 docs it is 16.3

All numbers are astonishingly low, as we know from libraries usage of
and books.

Eberhard Hilf,
Institute for Science Networking Oldenburg GmbH
at the Carl von Ossietzky University

On Tue, 9 Sep 2003, Stevan Harnad wrote:

> On Mon, 8 Sep 2003, Eberhard R. Hilf wrote:
> > the physics ArXiv has a linear increase of the number of papers put in per
> > month, this gives a quadratic acceleration of the total content (growth
> > rate of Data base), not linear.
> Maybe so. But slide 25 of
> (slide 25)
> still looks pretty linear to me. And it looks as if 100% was not only
> *not* reached at this rate 10 years after self-archiving started in
> physics in 1991, but it won't be reached for another 10 years or so...
> > Total amount by now may be at 10-15 % of all papers in physics.
> I count that as appallingly low, considering what is so easily
> feasible (though stunningly higher than any other field!)...
> > Linear growth of input rate means the number of physicists and fields
> > using it rises, while in each field (and physicist) a saturation is
> > reached after a first exponential individual rise.
> Interesting, but the relevant target is 100% of physics (and all other
> disciplines) -- yesterday!
> > Never there will be a saturation such that all papers will go this way,
> > since in different fields culture and habits and requirements are
> > different. --
> I couldn't follow that: Never 100%? Even at this rate? I can't imagine
> why not.
> But the point is that it's far too slow -- relative to what is not only
> possible, but easily done, and immensely beneficial to research,
> researchers, etc.
> > [That is why it is e.g. best, to keep letter distribution by
> > horses at a remote island (Juist) alive since the medieval times].
> That I really couldn't follow! If you mean paper is still a useful back-up,
> sure. But we're not talking about back-up. We are talking about open
> online access, which has been reachable for at least a decade and a half
> now, and OAI-interoperably since 1999. What more is the research cavalry
> waiting for, before it will stoop to drink?
> Stevan Harnad
> NOTE: A complete archive of the ongoing discussion of providing open
> access to the peer-reviewed research literature online is available at
> the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03):
> or
> Discussion can be posted to:

Received on Wed Sep 10 2003 - 14:40:51 BST

