Re: Central vs. Distributed Archives

From: Greg Kuperberg <greg_at_MATH.UCDAVIS.EDU>
Date: Thu, 2 Nov 2000 11:47:45 -0800

On Thu, Nov 02, 2000 at 06:28:35PM +0000, Stevan Harnad wrote:
> (5) The immediate issue is hence not the PERMANENCE of the
> self-archived drafts but their EXISTENCE, free for all, now. The
> permanence will take care of itself.

That may be so in other disciplines, but it has not been true in
mathematics for several years. In 1997, the year before the universal
math arXiv was started, there were already some 10 or 20 thousand research
papers freely available on the web. Most of them were on personal
home pages, but thousands were in institutional and subject-based
preprint series. Nonetheless the vast majority of these papers were
still eventually sold as published papers.

So what were the publishers selling? Not peer review, because you
can learn from Math Reviews where a paper has been published without
subscribing to the journal. To a large extent the journal system was
selling, and is still selling, stability and permanence. So that has
been the fundamental question of open archival in mathematics for years.
That is why some of the recalcitrant math publishers say that the arXiv
is "just a preprint server" and not a "permanent e-print archive".
Of course I don't agree with them; I choose the arXiv over subscription
journals as the future route to permanent archival.

> But of course the likely, practical strategy is for the researchers'
> universities and research institutions (or, more specifically, their
> libraries) to create and administer their institutional Eprint Archives
> for all their researchers' refereed output, in all disciplines. (We can
> have at least as abiding a faith in the durability of the collections
> on universities' airwaves, then, as we now have in the durability of
> the collections on their shelves).

As a practical matter most of the institutional preprint series in
mathematics are at the department level. At every university at which
I have studied or held an appointment, interdepartmental computer
services (a) are often mediocre, and (b) are often a one-size-fits-all
straightjacket. I don't even like central campus e-mail. In my view the
strength of university research is rooted in departmental independence.

So should we mathematicians trust individual math departments to
permanently preserve their e-prints? I don't think so. Our own math
preprint series at UC Davis is an arXiv overlay - all articles are
automatically contributed to the math arXiv. One of my arguments for this
arrangement is that we can't promise to babysit these preprints forever.
We could easily forget our obligation.

> I am not a mathematician, but this "whole is greater than the sum of its
> parts" argument does not add up for me!

When we put together the universal math arXiv from its disparate parts,
submissions immediately jumped by 40% (as of December 1997).
Since then the math arXiv has grown more quickly than the subject-based
archives that were not pulled into the fold. Take a look at the
submission statistics at my front end for the math arXiv:

> 50%. What possible reason would there be not to encourage complementing it
> by institutional Eprint Archives immediately -- given that they will all be
> co-harvested (and mirrored, and cached, etc.) in global virtual archives
> anyway, thanks to interoperability?

As I said above, in math the institutional archives are there already (and
there are still a few separate subject-based archives). They distract
authors as much as they encourage them. In fact one of the serious
problems with the fragmented interoperability system is multiple
submissions. Many authors like to advertise themselves by putting
their papers in more than one archive. Or if a paper has four authors,
it could go to four archives because each one has a different favorite.

As for your vision of global virtual archives, that hasn't happened yet.
If you wait for that then you can't also assure us that the revolution
can take place immediately. If we do have something to wait for, why
wait for a integrated facade with a fragmented foundation instead of
the other way around?

> A priori worries about distributed archiving alas belong to that long
> litany of prima facie rationales for inaction...

Obviously I'm not a conservative offering rationales for inaction.
And my worry is not "a priori". NCSTRL and MPRESS are two long-standing
attempts at standards-based fragmented interoperability. Neither one
has as much readership as the younger, fully integrated math arXiv.
  /\  Greg Kuperberg (UC Davis)
 /  \
 \  / Visit the Math ArXiv Front at
  \/  * All the math that's fit to e-print *
Received on Mon Jan 24 2000 - 19:17:43 GMT

