OA Archives: Full-texts vs. metadata-only and other digital objects

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Thu, 9 Jun 2005 14:44:41 +0100

On Thu, 9 Jun 2005, Tim Gray, Library Assistant, Homerton College Library wrote:

> >....only 15% of OA's target content (the annual 2.5 million full-text
> >research articles published in the world's 24,000 journals) is as yet
> >being self-archived, worldwide [...]
>
> I was under the, obviously mistaken, impression that all items harvested by
> OAIster were open access. Open access, to me, means (amongst other things)
> full text. So what are these authors doing? Archiving their metadata but
> not the actual text? They might as well not archive anything, then, I would
> have thought.
>
> I know that 85% are not self-archiving yet, but I assumed that OAIster
> covered the 15% who *were*. But maybe I've missed something?

Your query is quite natural and very important to raise, and to understand:

It is extremely important to distinguish, and understand the relation between:

    (OAI) (1999) the Open Archives Initiative (OAI), with its metadata
    harvesting/interoperability protocol and
    http://www.openarchives.org/meetings/

    (OA) (2001) the (Budapest) Open Access (OA) Initiative (BOAI), with its
    objective of open access to the full texts to preprints and postprints
    of articles (and dissertation)
    http://www.soros.org/openaccess/

OAI began in 1999 with an OA focus -- to make OA archives interoperable. But it
soon became much more general: a metadata harvesting protocol for making all sorts
of digital archives -- not just OA archives -- interoperable.

OAIster harvests from all known OAI-compliant archives, not just OA archives.
http://oaister.umdl.umich.edu/o/oaister/

Moreover, even the OA archives are not necessarily 100% full-text! In other words,
it is not at the moment known what percentage and which of the current
5,475,850 records from 480 institution harvested by OAister correspond to
OA full-texts.

We can be fairly sure that OA full-texts are the minority in OAIster, based
on the estimates from the OA full-texts crawlers from Oldenburg, Southampton
and Universite du Quebec a Montreal, which converge on 15% worldwide
and discipline-wide for the past 10 years.

    http://citebase.eprints.org/isi_study/
    http://www.crsc.uqam.ca/lab/chawki/ch.htm

Probably OAIster's percentage of OA full-texts is higher than the 15%
that is the average for the literature as a whole, but it might not be
very much higher than that, not just because not all records link to
full texts, or OA texts, but especially because some of the "full-text"
records are not OA target-texts (i.e., preprints, postprints, theses)
but other kinds of digital objects: courseware, institutional records,
video, audio, software!

Kat Hagedorn at OAister can estimate the proportion of full texts offline
and has done so in the past:

    Re: DOAJ, OAIster and Romeo should chart growth, as EPrints does
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/3526.html

OAister cannot yet accurately chart the full text OA subset online,
but I hope it soon will (Kat?). I hope Kat will correct my errors
or omissions in summarizing OAIster!

Tim Brody's Institutional OA Archives Registry covers a somewhat more
OA-focussed subset of OAIster's Archives

    http://archives.eprints.org/eprints.php?action=browse

and is now in the process of adding powerful new features, but it too
does not yet distinguish full-texts from metadata-only. It charts
the growth of both the number of archives and the number of records
for 7 kinds of OAI-compliant Archives:

Institutional/Departmental Research Archives:

A: Number of Archives: 210
c: Number of Celestial-harvestable subset: 158
rc: Number of Records from celestial-harvestable: 452600
ac: average records per celestial-harvestable archive: 2865

Cross-Institutional Research Archives: A:55 c:43 rc:1429670 ac:33248

E-Thesis Archives: A:54 c:40 rc:155965 ac:3899

E-Journal/E-Publication Archives: A:39 c:30 rc:83631 ac:2788
Demonstration Archives: A:24 c:11 rc:5961 ac:542
Database Archives: A:8 c:4 rc:1958 ac:490

Other Kinds of Archives: A:44 c:26 rc:373058 ac:14348

But at the moment the record counts and averages (rc and ac) cannot
distinguish full-text records from metadata-only records, and the
latter are in the vast majority. The Archives Registry can also only track
archives that are harvestable by celestial: http://celestial.eprints.org/

(The administrators of Institutional Repositories/Archives could
help us a great deal if (1) those with OAI-compliant archives not
in the Registry could register them at
http://archives.eprints.org/eprints.php?action=add
and (2) those archives in the registry that are not celestial-harvestable
(122/434 = 28%) could provide the data to make them celestial-harvestable.)

In conclusion: The 15% OA full-text estimate is probably right. So the
most important task is to increase that OA content form 15% to 100%. This
requires the adoption of institutional OA self-archiving policies.

    http://www.eprints.org/signup/fulllist.php

Sorting out the 15% full-texts from the metadata-only and other kinds of digital
objects in OAI-space today will help a little, but only institutional
self-archiving policy will get us over the top at last!

Stevan Harnad

AMERICAN SCIENTIST OPEN ACCESS FORUM:
A complete Hypermail archive of the ongoing discussion of providing
open access to the peer-reviewed research literature online (1998-2005)
is available at:
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/
        To join or leave the Forum or change your subscription address:
http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
        Post discussion to:
        american-scientist-open-access-forum_at_amsci.org

UNIVERSITIES: If you have adopted or plan to adopt an institutional
policy of providing Open Access to your own research article output,
please describe your policy at:
        http://www.eprints.org/signup/sign.php

UNIFIED DUAL OPEN-ACCESS-PROVISION POLICY:
    BOAI-1 ("green"): Publish your article in a suitable toll-access journal
            http://romeo.eprints.org/
OR
    BOAI-2 ("gold"): Publish your article in a open-access journal if/when
            a suitable one exists.
            http://www.doaj.org/
AND
    in BOTH cases self-archive a supplementary version of your article
            in your institutional repository.
            http://www.eprints.org/self-faq/
            http://archives.eprints.org/
Received on Thu Jun 09 2005 - 14:44:41 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:55 GMT