Re: EPrints, DSpace or ESpace?

From: Stevan Harnad <>
Date: Tue, 17 Jun 2003 16:19:38 +0100

On Tue, 17 Jun 2003, Jan Velterop wrote:

> Probably of interest to readers of this list:

In that article in The Scientist, "UC to launch open-access journals,"
Catherine Zandonella writes:

> In a trend that could permanently alter the nature of scholarly
> publishing, several top research universities are setting up
> electronic superarchives to store and share their researchers'
> data. Some universities see these "institutional repositories"
> simply as a way to capture their intellectual output, but others
> aim to use their repositories as a means of launching open-access
> alternatives to conventional academic journals.

"Simply a way to capture their intellectual output"? Clearly the point
of self-archiving refereed research has been completely missed here!

Unfortunately, Zandonella's article simply propagates the growing
wave of nonspecific euphoria about university repositories, which seems
to be based on freely conflating distinct and not always compatible
potential uses for such repositories.

I suggested:

>sh> The 5 distinct aims for institutional repositories are:
>sh> I. (RES) self-archiving institutional research output (preprints,
>sh> postprints and theses)
>sh> II. (MAN) digital collection management (all kinds of digital
>sh> content)
>sh> III. (PRES) digital preservation (all kinds of digital content)
>sh> IV. (TEACH) online teaching materials
>sh> V. (EPUB) electronic publication (journals and books)
>sh> As long as we keep blurring or mixing these 5 distinct aims, the
>sh> first and by far the most pressing of them, RES -- the filling of
>sh> university eprint archives with all university research output,
>sh> pre- and post-peer-review, in order to maximize its impact
>sh> through open access -- will be needlessly delayed (and so will
>sh> any eventual relief from the university serials budget crisis).

UC seems to be another instance of conflating I. (RES) and V. (EPUB).
It is hard to discern whether this is just a case of (i) misunderstanding
the essential feature of peer review -- which is that it must be an
autonomous, outsourced, neutral-3rd-party service, otherwise it risks
just becoming a house organ or vanity press -- or else a case of (ii)
High (Wire Press) Hopes (Stanford Envy?): Universities seeking to make
a bigger inroad into electronic publishing.

> This fall, the University of California (UC) plans to unveil just
> such an option for its researchers: the ability to create and run
> an open-access, peer-reviewed journal within the framework of its
> eScholarship Repository.

But the question is this: Is more peer-reviewed journals really what the
planet needs today (it has 20,000 already, most of them toll-access)? And is
the best contribution universities can make with their "superarchives"
to create new journals? Or would it be more useful (to both themselves
and other universities) if they instead focused on making their *own*
peer-reviewed research publications openly accessible by self-archiving
them in their own eprint archives (RES)? Does it help either objective
(RES or EPUB) to conflate them under the one rubric of "superarchive"
(not UC's word, but a predictable reaction of the press, if we keep
freely admixing I. - V.). Especially at a time when archive frenzy is
growing fast, but self-archiving itself is still growing too slowly!

> The repository, which is open to all users, will provide software
> tools to automate the process of sending out papers for peer review;
> the journal editors will determine the editorial policies and the
> publication schedule. "We are trying to provide the continuum
> of publishing alternatives," said Suzanne Samuel, eScholarship
> Program coordinator for the California Digital Library, which runs
> the repository for the UC system. (The eScholarship site already
> contains one open-access journal, Dermatology Online Journal, which
> was launched in 1995 and later moved to the UC site.)

As Gerry McKiernan's recent overview shows, there are *many* new
pieces of software being created to automate peer review and journal
publication, all designed to make journal publishing faster, cheaper,
and more efficient.
But what has this to do with any pressing problem facing the university
(such as research access, research impact, or the serials crisis)?

> The idea for institutional repositories arose out of the need to
> archive the increasing amount of data researchers now store on their
> hard drives or display on their web sites. The data in the repository
> are indexed with meta-tags that allow a variety of search strategies,
> and the repository software provides the framework for checking data
> in, storing it, and retrieving it via a web interface. A repository
> can also serve as a preprint server, where researchers can solicit
> comments on unpublished work.

But what does this research data-archiving -- an excellent idea and a
subset of RES -- have to do with EPUB? And why are unrefereed preprints
(an excellent and welcome bonus) singled out for self-archiving when it
is peer-reviewed, published postprints to which access is most urgently

> An important development in the creation of repositories came last
> fall with the launch of DSpace, a repository software platform
> developed at the Massachusetts Institute of Technology (MIT) in
> collaboration with Hewlett-Packard. The DSpace software can be
> downloaded for free, and about 3400 individuals and institutions
> have now done so.

And so can a lot of other software, as indicated earlier in this
discussion thread. But what universities need now is not more software
but a much clearer idea of what to do with it, and why!

> A consortium of universities, called the DSpace Federation,
> is beta-testing the software. The Federation includes Columbia
> University, Cornell University, Ohio State University, University
> of Rochester, University of Washington, University of Toronto,
> and Cambridge University.

Meanwhile, at least 72 universities are already running eprint archives,
some for as long as 2 years already: So what? The
archives need filling. And to understand why they need filling, and with
what they need filling, I. - V. have to be separated and each dealt with
in its own right, on its own agenda. Conflating the five just keeps
everything at the beta-testing stage!

> The DSpace software contains no rules on who can enter data, what
> kinds of data can be accepted, or who can access them. Instead,
> the DSpace users set up "communities" and establish their own terms
> of use.

What the university community needs is a clear idea of what these
archives are for, and how to go about filling them. I may be wrong, but
at this moment the rationale and urgency for RES (I), the self-archiving
of research output, pre- and post-peer-review, seems to vastly outweigh
that for the other four. But, more important, RES is so distinct from the
other four that it would almost be better if we did not think of all
five as just different "superarchive" functions, but as independent
university functions in their own right. And I don't know about the
other four, but I am pretty sure that RES is better served
by having a lot of OAI-interoperable departmental archives rather than just
one university monster-archive (especially if the central superarchive
would conflate I - V!): Isn't that sort of integrable distribution of
the burden part of the rationale for OAI interoperability?

> One federation member that plans to use DSpace to further its goal of
> providing free access to peer-reviewed content is Cornell University.
> Among the reasons for doing this is the feeling that the existing
> publishing model isn't serving universities well, said J. Robert
> Cooke, professor of agricultural and biological engineering and dean
> of the faculty at Cornell. "Long ago we outsourced publishing to
> [commercial] publishers," said Cooke. "Now we need to take it back."

So (to put it graphically): Is Cornell University planning to make its
Science and Nature publications open-access by self-archiving them (RES),
or is it planning to create Cornell House-Journals to publish them in
instead (EPUB), rather than "outsourcing" them to the established
peer-reviewed journals?

> Repositories can serve as a bargaining chip for universities in
> the debate over the future of scholarly publishing, believes Hal
> Abelson, MIT Class of 1922 professor of computer science. "We [the
> universities] have something to bring to the table," said Abelson.

Fine, but what, exactly, are we bargaining about? Open access to our own
peer-reviewed research output? But we can already have that by
self-archiving it in our eprint archives (RES)! What has this to do with
universities trying to get more involved in electronic publication

Or does Hal Abelson mean universities should pressure publishers to
make sure they have updated their copyright agreements to formally
support self-archiving? That is a good idea, but there is considerable
momentum there already, with 55% of publishers already formally
supporting self-archiving, and most of the others agreeing if asked on an
individual basis.
By that token, the RES archives should be at least 55% full already!

But I agree that universities have leverage here -- although it has little
to do with EPUB: It is because *authors* want and need maximal research
impact that publishers have little choice but to support self-archiving,
not because universities threaten to become journal-publishers [EPUB].

> But Harold Varmus, president and chief executive officer of Memorial
> Sloan-Kettering Cancer Center in New York City and cofounder of the
> Public Library of Science -- which later this year plans to publish two
> new open-access biomedical journals -- is skeptical about the idea that
> repositories themselves will help to bring about change. He emphasized
> that journals, not repositories, are the primary record of science.
> "They [repositories] are not going to replace the idea of having an
> investigator write up results," said Varmus.

And Harold Varmus is of course right. Self-archived, unrefereed preprints
in one's university eprint archive are merely vanity-press until/unless
they are submitted for independent, expert peer-review by a peer-review
service-provider with established quality-standards that would-be users
of those findings can rely upon. Such a service has to be "outsourced"
and it happens to be performed at the moment by 20,000 peer-reviewed
journals, with their own established expertise, quality-standards and
known track-records.

The problem is not "repatriating" that peer-review service. It has to
continue to be an autonomous, 3rd-party service. The problem is access
to its *outcome*: the refereed final drafts. Self-archiving solves that
problem, not by providing a substitute for journals but by supplementing
access to their full-text contents (toll-free).

Harold Varmus himself conflated EPUB and RES somewhat in the original
version of his otherwise splendid and timely EBiomed proposal, but it
is clear that this has since been thought through and sorted out.

> Repositories won't make journals go away, agreed Rick Johnson,
> enterprise director at the Scholarly Publishing and Academic Resources
> Coalition (SPARC), a group that advocates an open model of scientific
> publishing. But, said Johnson, "They begin a process of change that
> will bring about emergence of different business models that support
> science communication."

Self-archiving (RES) provides open access, immediately. That's what's
urgently needed by the research community. New business models for
refereed journal publishing may follow, but what is needed *now* is
university self-archiving.

> Johnson thinks the availability of preprints, data sets, and images
> will spur communication and feedback among fellow scientists. "People
> will say, 'Gee, my research is hidden behind toll gates today. If
> it was not hidden, imagine what kind of impact it could have.'"

One can hardly disagree, now that SPARC is beginning to come round to
that sensible view! (It is not that long since SPARC's only visible goal
was lower journal prices!)

But it is not just, or even primarily, about (unrefereed) preprints,
data sets and images. It is about toll-free access to *refereed
research.* SPARC needs to be much clearer on that, otherwise they too are
contributing to the gridlock that comes from conflating I. - V.

> At the very least, these superarchives will draw universities into
> the ongoing debate over who should be the gatekeeper of scientific
> information. But Pieter Bolman, vice president and director of
> science, technology, and medical relations for Elsevier Science is
> bullish about the continuing importance of subscription journals. He
> said that although scientists may no longer need journals for
> peer-review -- as they can set up their own systems for reviewing
> papers -- they will continue to seek publication in the journals with
> the best reputation.

I will bet a good deal of money that Pieter Bolman did *not* say anything
as patently nonsensical as that! (Pieter?) This was a journalist's
own contribution to the confusion with which this simple domain is so
rife! Pieter is fully aware that "gate-keeping" has to be outsourced,
and that it is its track-record for gate-keeping that gives a journal
its reputation, not merely its name.

But this absurd picture of universities serving as their own
gate-keepers (EPUB?) along with the idea that this will
co-exist with journals subscribed to purely for their names is
just one facet of the incoherent chimera -- like a 5-dimensional
Escher-drawing -- that comes from conflating I. - V.: It's time
to de-conflate!

> One issue that the emergence of repositories brings to the fore is
> that of copyright. Most scholarly journals acquire copyright from the
> author and grant certain rights in return. The exact terms of this
> agreement vary widely, said Jane Ginsburg, an expert in copyright
> law at Columbia Law School in New York.

Indeed. But the only *relevant* term insofar as the refereed research
literature is concerned is whether or not they allow self-archiving --
and, regarding *that*, journals are quickly, sensibly, and responsibly
converging on the optimal and inevitable outcome:

> Many journals grant authors the right to post the article on a
> personal or university web site. However, "It is one thing if a
> bunch of individual professors put papers on their web sites, but
> it might be another matter if a university does it," said Ginsburg.

No, in the age of OAI-interoperability it does not matter in the
slightest whether it is individual professors or their universities who
self-archive their papers -- as long as the site is OAI-compliant. But
where universities and even governmental research-funding agencies can
help is in extending their existing "publish or perish" carrot/stick
to :"publish and self-archive" (for maximal research impact):

> Mary Waltham, a former publisher of the Nature journals and
> now a consultant for the publishing industry, can see that
> happening. "Search tools are becoming better, and my own personal
> view is that at some point, one will be able to search the Internet
> and find copies of these articles in repositories," said Waltham.

Yes, but it is not search tools that will make that day come, but a
systematic institutional policy of self-archiving those articles in those
institutional repositories! To sort that out, functions II-V have to be
disentangled from the all-important function I (RES).


Stevan Harnad
