Re: Financial Times Article on Self-Archiving: 23 July 2001

From: Tim Brody <tdb198_at_ECS.SOTON.AC.UK>
Date: Thu, 2 Aug 2001 21:55:59 +0100

On Tue, 31 Jul 2001, Albert Henderson wrote:

> on 31 Jul 2001 Stevan Harnad <> wrote:
> > On Tue, 31 Jul 2001, Albert Henderson wrote:
> >
> > > Why not produce hard evidence that Harnad's above claim
> > > is true:
> > >
> > > sh> But virtually all of the self-archived preprints in arxiv are
> > > sh> submitted to refereed journals, revised... [etc]
> > >
> > > and applies to the science literature generally???
> >
> > Here's some (already cited in reply several times):
> >
> >
> In his analysis of the papers on the LANL
> server, Tim Brody tells us:
> "The proportion of papers that have got
> Journal-ref entries is 36.87%." This
> would include those that are submitted
> after formal publication rather than
> being first submitted as preprints.
> Thank you for your help. It appears that the physics
> situation is much the same as the informal literature
> studied by Garvey and others.

Dear all,

As this thread has gravitated towards the analysis that I have done on
arXiv metadata I would raise a few points:

Firstly, that arXiv metadata is contributed by authors and augmented by
Slac/Spires. Where SS supplies publication data (as a more general index
of High Energy Physics journals), the percentage of papers with journal
references is much higher. Where metadata is left to the author metadata
would appear to be less well maintained (whether this is because arXiv is
only pre-prints, and isn't updated with post-prints, or whether simply a
lack of metadata maintenance is unknown).

It should also be noted that the journal-ref field can be filled with any
author-supplied reference. It would need to be referred to arXiv staff to
find out whether they authenticate this particular information.

Perhaps a more useful metric is to look at this problem from the other
direction, namely how many published physics papers are also present in
arXiv. For HEP, nearly (95%?) all the papers indexed by Slac/Spires are
also present in arXiv (I can't provide precise figures, although given
time it would be possible to use the SLAC engine to determine a more
precise figure).

I have no doubt that the great proportion of high energy physicists
deposit their published papers in arXiv, and that proportion is
growing for the other areas covered by arXiv (if Albert can supply a
counter for physics from the last 5 years I would be most interested).

Away from arguing figures, I find it odd that research literature is
locked away, when the Internet offers an infinitely cheap way of
distributing that literature (and of significantly reducing the costs of
publishers). I also find it odd that a librarian, forgive me if I
misunderstand, should be so against freeing this information resource,
with libraries being one of the most important parties to gain (lest they
never want to see a student again)?

All the best,
Tim Brody
Computer Science, University of Southampton
