Beyond Romary & Armbruster On Institutional Repositories

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Sun, 5 Jul 2009 19:31:38 -0400

      ** Apologies for Cross-Posting **

      Fullly hyperlinked version of this posting:
      http://openaccess.eprints.org/index.php?/archives/606-guid.html

      Critique of: Romary, L & Armbruster, C. (2009) Beyond
      Institutional Repositories.
      http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1425692

________________________________

      R&A: "The current system of so-called institutional
      repositories, even if it has been a sensible response at
      an earlier stage, may not answer the needs of the
      scholarly community, scientific communication and
      accompanied stakeholders in a sustainable way."


Almost all institutional repositories today are near-empty. Until and
unless they are successfully filled with their target content, talk
about their "answering needs" or being made "sustainable" is moot.

The primary target content of both the Open Access movement and the
Institutional (and Central) Repository movement is refereed research:
the 2.5 million articles per year published in the planet's 25,000
peer-reviewed journals. (That is why R&C speak, rather ambiguously,
about "Publication Repositories.")

Institutions are the universal providers of all that refereed
research output, funded and unfunded, in all scholarly and scientific
disciplines, worldwide.

Institutions have a fundamental interest in hosting, inventorying,
monitoring, managing, assessing, and showcasing their own research
output, as well as in maximizing its uptake, usage and impact.

Yet not only is most of the research output of most institutions
failing to be deposited in the institution's own repository: most of
it is not being deposited in any other repository either. (Please
keep this crucial fact in mind as you reflect on the critique below.)

      R&A: "[H]aving a robust repository infrastructure is
      essential to academic work."


A repository, be its "infrastructure" as "robust" as you like, is of
no use for academic work as long as it is near-empty.

      R&A: "[C]urrent institutional solutions, even when
      networked in a country or across Europe, have largely
      failed to deliver."


Largely empty repositories, "networked" to largely empty repositories
remain doomed to deliver next to nothing.

      R&A: "Consequently, a new path for a more robust
      infrastructure and larger repositories is explored to
      create superior services that support the academy."


Making largely empty repositories "larger" (by "networking" them) is
as futile as "making their infrastructure more robust": What
repositories lack and need is their target content.

The reason most repositories are near-empty is that most researchers
are not depositing in them.

And the reasons most researchers are not depositing are multiple
(there are at least 34 of them), but they boil down to one basic
reason, and researchers have already indicated, clearly, in
international surveys, what that one basic reason is: Deposit has not
been mandated (by their institutions or their funders). 

Ninety-five percent of researchers surveyed across all disciplines,
worldwide, most of whom do not deposit, respond that they would
deposit if deposit were mandated, 14% of them reluctantly, and 81% of
them willingly. (Swan) 

And outcome studies have shown that researchers do what they said
they would do: When deposit is mandated, they do indeed deposit, in
high proportions, within two years of adoption of the deposit
mandate. (Sale) 

Hence what institutions need in order to induce their researchers to
deposit is not larger or more robust repositories, but deposit
mandates.

The number of mandates is growing, but there are still as yet only 90
of them worldwide. 

Hence what is urgently needed to fill repositories so they can begin
providing "superior services" for the academy is more mandates, not
larger repositories or "more robust infrastructure."

      R&A: "[F]uture organisation of publication repositories
      is advocated that is based upon macroscopic academic
      settings providing a critical mass of interest as well as
      organisational coherence."


The only "critical mass" that repositories need is their missing
target OA content.

Researchers have an intrinsic interest in making their research
output OA. Institutions have an interest in making their research OA.
Funders have an interest in making their research output OA. And the
tax-paying public has an interest in making the research they fund
OA. 

In contrast, subscription/license publishers do not have an intrinsic
interest in making the research they publish OA except if they are
paid for it (via Gold OA publication fees). Publishers view Green
OA (via repository deposit) as putting their subscription and license
revenues at risk. They haven't much choice but to endorse deposit by
their authors, given the research benefits of OA, and particularly
when it is mandated by their authors' institutions and funders; but
publishers themselves certainly have no need or desire to do the
depositing on their authors' behalf, for free.

The way to see this clearly is to realize that Green OA amounts to
repository deposit by authors, for free, whereas Gold OA amounts to
"repository deposit" by publishers, for a fee.

Most publishers are not depositing today because they are not
being paid to do it. 

Most authors are not depositing today because they are not
being mandated to do it.

There is no solution in "amalgamating" these respective empty
repositories (unmandated Green and unpaid Gold). The solution is
either more mandates or more money. 

As subscriptions/licenses are covering the costs of publication
today, there is neither the need to pay for Gold OA, pre-emptively,
today nor the extra money to pay for it: The potential money is tied
up in paying the subscription/license fees that are already covering
the costs of publication.

Mandates do not depend on publishers but on institutions and funders;
nor do mandates bind publishers: they bind only authors. It is hence
incoherent to imagine macro-repositories fed by both authors and
publishers. Nor is it necessary, since institutional (and funder)
deposit mandates, along with institutional repositories are jointly
necessary and sufficient to achieve 100% OA.

      R&A: "Such a macro-unit may be geographical (a coherent
      national scheme), institutional (a large research
      organisation or a consortium thereof) or thematic (a
      specific research field organising itself in the domain
      of publication repositories)."


"Macro" organisations -- whether institutional consortia, national
consortia or disciplinary consortia -- do not resolve this
fundamental contradiction between free access and any scheme to pay
for access.

(In principle, McDonalds and Burger King could give free access to
hamburgers if a global consortium of some sort were to agree to
bankroll it all up-front; however, that would hardly be free access:
it would simply be global acquiescence to a global oligopoly on the
sale of a product.)

So forget about counting on publishers to deposit articles in OA
repositories -- whether institutional or central -- unless they are
paid up-front to do so. And paying them to do so via licenses is not
"organisational coherence" but what biologists would call an
"evolutionarily unstable strategy," doomed to collapse because of its
own intrinsic instability.

It is the articles' authors who need to deposit, and it is that
deposit that their institutions and funders need to mandate.

      R&A: "The argument proceeds as follows: firstly, while
      institutional open access mandates have brought some
      content into open access, the important mandates are
      those of the funders"


This "argument" is demonstrably incorrect. 

Not all or even most of research is funded, whereas all research
originates from institutions. Hence institutional mandates
coverall research, whereas funder mandates cover
only funded research.

The NIH, RCUK and ERC funder mandates were indeed important because
they set an example for other funders to follow (and many are indeed
following); but that still only covers funded research. Funder
mandates do not scale up to cover all research.

The Harvard, Stanford and MIT institutional mandates were hence far
more important, because they set an example for other institutions to
follow (and many are indeed following); and this does cover all
research output, because institutions are the universal providers of
all research output, whether funded or unfunded, across all
disciplines.

      R&A: "[Funder mandates] are best supported by a single
      infrastructure and large repositories, which incidentally
      enhances the value of the collection (while a transfer to
      institutional repositories would diminish the value)."


This is again profoundly incorrect. The only "value enhancement" that
empty collections need is their missing content. (Nor are we talking
about "transfer" yet, since the target contents are not being
deposited. We are talking about mandating deposit.)

Funder mandates can be fulfilled just as readily by depositing in
institutional repositories or central ones. Repository size and locus
of deposit are completely irrelevant. All OAI-compliant repositories
are interoperable. The OAI-PMH allows central harvesting from
distributed repositories. In addition, transfer protocols like SWORD
allow direct, automatic repository-to-repository transfer of
contents. 

Hence there is no functional advantage whatsoever to direct central
deposit, since central harvesting from institutional repositories
achieves exactly the same functional result. Instead, direct central
deposit mandates have the great disadvantage that they compete with
institutional mandates instead of facilitating them. 

Both the natural and the optimal locus of deposit is the
institutional repository, for both institutions and funders. That way
funder mandates and institutional mandates collaborate and converge,
covering all research output.

Summary:
(1) Repository size and "infrastructure" do not generate content. 
(2) Empty repositories are useless. 
(3) The only way to fill them is to mandate deposit. 
(4) Not all or most research is funded. 
(5) But all research originates from institutions.
(6) Institutions' interests are served by hosting and managing their
own research assets.
(7) Hence both institutional and funder mandates should converge on
institutional deposit.
(8) Any central collections can then be harvested from the global
distributed of institutional repositories. 

And now an important correction of a widespread misinterpretation of
the relative success of institutional and central repositories in
capturing their target content:

The Denominator Fallacy. With one prominent exception -- which has
absolutely nothing to do with the fact that the exceptional
repository in question, the physics Arxiv, happens to be central
rather than institutional -- unmandated central repositories (and
there are many) are no more successful in getting themselves filled
with their target content than unmandated institutional repositories.
The critical causal variable is the mandate, not the repository's
centrality or size. (HAL/CNRS http://bit.ly/l0YIa)

The way to arrive at a clear understanding of this fundamental fact
is to note that the denominator -- i.e., the total target content
relative to which we are trying to reckon, for a given repository,
what proportion of it is being deposited -- is far bigger for a
central disciplinary repository than for an institutional
repository. 

For an institutional repository, its denominator is the total number
of refereed journal articles, across all disciplines, produced by
that institution annually. 

For a central disciplinary repository, its denominator is the total
number of refereed journal articles, across all institutions
worldwide, published in that discipline annually (for a national
repository, like HAL, it is the total research output of all the
nation's institutions, across all disciplines). 

So it is no wonder that central repositories are "larger" than
institutional ones: Their total target content is much larger. But
this difference in absolute size is not only irrelevant but deeply
misleading. For the proportion of their total annual target content
that unmandated central repositories are actually capturing is every
bit as minuscule as the proportion that unmandated institutional
repositories are capturing. And whereas the total size of a mandated
institutional repository remains much smaller than an unmandated
central repository, the reality is that the mandated institutional
repositories are capturing (or near capturing) their total target
outputs, whereas the unmandated central repositories are far from
capturing theirs.

The reason Arxiv is a special case is not at all because it is a
central repository but because the physicists that immediately began
depositing in Arxiv way back in 1991, with no need whatsoever of a
mandate to impel them to do so, had already long been doing much the
same thing in paper (at the CERN and SLAC paper depositories), and
necessarily centrally, because in the paper medium there is no way
one can send one's paper to "everyone," nor to get everyone to access
or "harvest" each new paper from each author's own institutional
depository (if there had been such a thing).

All of that is over now. And if physicists had made the transition
from paper preprint deposit to online preprint deposit directly today
rather than in 1991, in the OAI-MPH era of repository
interoperability and harvesting, there is no doubt that they would
have deposited in their own respective institutional repositories and
CERN and SLAC and Arxiv would simply have harvested the metadata
automatically from there (with the obvious computational alerting
mechanisms set up for harvesting, export and import).

But that longstanding cultural practice of preprint deposit among
physicists would be just as anomalous if physicists had begun it all
by depositing institutionally rather than centrally, for no other
(unmandated) central repository (or discipline) is capturing the high
portion of its annual total target content that the physics Arxiv is
capturing (in certain preprint-sharing subfields of physics) and has
been capturing ever since since 1991, in the absence of any deposit
mandate.

So the centrality, size and success of Arxiv is completely irrelevant
to the problem of how to fill all other unmandated repositories,
whether central or institutional, large or small, and regardless of
the "robustness" of their "infrastructure." Only the mandated
repositories are successfully capturing their target content, and
there is no longer any need to deposit directly in central
repositories: In the OAI-compliant OA era, central "repositories"
need only be collections, harvested from the distributed local
repositories of the universal research providers: the institutions.

      R&A: "Secondly, we compare and contrast a system based on
      central research publication repositories with the notion
      of a network of institutional repositories to illustrate
      that across central dimensions of any repository solution
      the institutional model is more cumbersome and less
      likely to achieve a high level of service."


The assumption is made here -- with absolutely no supporting
evidence, and with all existing evidence (other than the single
special case of Arxiv, discussed above) flatly contradicting it --
that researchers are more likely to deposit their refereed journal
articles in big central repositories than in their own institutional
repositories. 

All evidence is that researchers are equally unlikely to deposit in
either kind of repository unless deposit is mandated, in which case
it makes no difference whether the repository is institutional or
central -- except that if both funders and institutions mandate
institutional deposit then their mandates converge and reinforce one
another, whereas if funders mandate central deposit and institutions
mandate institutional deposit then their mandates diverge and compete
with one another. (And of course the natural direction for harvesting
is from local to central, not vice versa: We deposit on our
institutional websites and google harvests from there; it would be
absurd to deposit in google and then harvest back to our own
institutional website. The same is true for any central OAI
harvesting service.)

      R&A: "Next, three key functions of publication
      repositories are reconsidered, namely a) the fast and
      wide dissemination of results; b) the preservation of the
      record; and c) digital curation for dissemination and
      preservation."


Again, these functions in no way distinguish central and
institutional repositories (both can and do provide them) and have no
bearing whatsoever on the real problem, which is the absence of the
target content -- for which the remedy is to mandate deposit.

      R&A: "Fourth, repositories and their ecologies are
      explored with the overriding aim of enhancing content and
      enhancing usage."


You cannot enhance content if the content is not there. And you
cannot enhance the usage of absent content. Hence it is it not
enhancements that are needed but deposit mandates.

      R&A: "Fifth, a target scheme is sketched, including some
      examples."


The target scheme includes a suggestion that publishers should do the
depositing, of their own proprietary version of the refereed article.
This is perhaps the worst suggestion of all. Just when institutions
are at last realizing that they can host and manage their own
research output by mandating that their researchers deposit their
final refereed drafts in their own institutional repositories, Romary
& Armbruster instead suggest "consolidated" central "publication
repositories" in which publishers do the depositing. (The question to
contemplate is: If it requires a mandate to induce researchers to
deposit, what will it require to induce publishers to deposit --
other than paying them to do it? And if so, who will pay for what,
and why?)

Most of the rest of the suggestions are superfluous, and fail
completely to address the real problem: the absence of OA's target
content. You can't go "beyond" institutional repositories until you
first fill them.

Stevan Harnad
http://www.eprints.org/openaccess/
Received on Mon Jul 06 2009 - 00:35:01 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:50 GMT