Harvesting from the many OA servers that are not yet OAI-compliant

From: Eberhard R. Hilf <hilf_at_ISN-OLDENBURG.DE>
Date: Fri, 29 Apr 2005 08:42:07 +0200

Regarding Stevan's U-Haul Allegory, with 1,000,000 authors halted with
their trucks full of prime scientific information in front of the mostly
green traffic lights, which they mostly don't trust or can't read or are
too timid to act upon: The suggestion to wait there for their bosses to
intervene and give them a kick needs a kick of its own!

There is a bypass around this small-town traffic jam with its traffic
lights, a highway that authors drive through, with no traffic lights, just
go! First provide Open Access (OA, to the unpublished preprint) and then
negotiate with whomever! This express lane is also used by the authors who
know that the journal to which they plan to submit the preprint is one of
the 90%+ of journals that are green.

In Physics alone we already have 152,380 OA scientific documents
self-archived in 1,798 institutional servers distributed worldwide.

By the way: We would have a more accurate global "truck" count if the OA
Archives Registry http://archives.eprints.org/ updated to include the
actual total number of OA-papers served by OAI-MPH archives.

Over and above the 300 OAI-MPH archives monitored in the Registry, PhysNet
alone would add another 1,798 distributed servers whence the relevant
documents are gathered, analyzed for their metadata, and -- if they pass
(or can be made pass) the OAI-MPH hurdle -- are presented in the
OAI-dataprovider outlet of PhysNet.


The separation of data-providers and service-providers in the OAI-concept
helped greatly to produce clarity, but it also obscures the fact that
services such as PhysNet are in fact both: They bring together documents
that are not part of an existing data-provider, thereby increasing greatly
the number of OA-documents that are available.

We feel that there exist far more OA-documents on local institutional
servers than those few that fit the strict OAI-MPH rules and are
self-archived by the still few official OAI-data providers.

Let us do a little estimate for Physics alone:

100,000 is an estimate of the number of university staff in physics
worldwide. Each might have on average self-archived about 10 relevant
manuscripts per year on their local institutional server, preprint,
eprint, etc. The growing habit of doing this in physics started some 8
years ago. This would yield (adjusting for the gradual change in habits)
about 5 million OA documents available on the web from about 2,000
Universities worldwide. Counting only those that come from an
OAI-MPH-compliant data provider hence produces a gross underestimate.

An example: Of the about 5 million OA physics documents individually
self-archived by scientists on their institutional servers, PhysNet finds
only 5% today (but it is being redesigned to include much more). Of those
documents, 8% have metadata that are more or less compliant with Dublin
Core. Yet of those 2,000 institutional servers, only about 300 are
detected by http://archives.eprints.org/

Summary: Open Access is already a very widespread phenomenon, but the
official OA counts are not yet revealing this. The OAI services are
nevetheless a great help in finding documents and in identifying them as
having been considered relevant by the data provider.

Just by adding metadata to quantities of OA documents harvested from local
research groups using http://www.isn-oldenburg.de/services/mmm/ rather
than waiting for each institution to make up its mind to adopt an official
OA self-archiving policy, we have generated an enormous positive response
from authors, gratified at being more cited, being found in google, being
phoned and emailed by colleagues, etc.

We are hence following Stevan's motto: don't sit waiting: just go ahead
and do.

 Eberhard R. Hilf, Dr. Prof.;
 Institute for Science Networking Oldenburg GmbH
 an der Carl von Ossietzky Universitaet
 Ammerlaender Heerstr.121; D-26129 Oldenburg
 ISN-home: http://www.isn-oldenburg.de/
 homepage: http://isn-oldenburg.de/~hilf
 email : hilf_at_isn-oldenburg.de
 tel : +49-441-798-2884
 fax : +49-441-798-5851
 Why not visit
 - Buendnis Urheberrecht fuer Bildung und Wissenschaft
 - Open Access www.zugang-zum-wissen.de
 - Physics Distributed Network: www.physnet.net
