Re: EPRINTS = PREPRINTS (unrefereed) + POSTPRINTS (refereed)

From: Declan Butler, Nature
Date: Thu, 18 May 2000

Might not systems like ResearchIndex be part of the solution to this
The extract below is from (free access)

Searches for researchers
But what about search engines designed specifically for scientists? NEC's
prototype, called ResearchIndex, gathers fragmented scientific resources
from around the web, and automatically organizes them within a citation
index. And unlike most search engines, ResearchIndex retrieves PDF (portable
document format) and postscript files, widely used by scientists to format
manuscripts. It starts by querying dozens of popular search engines for a
series of terms likely to be associated with scientific pages, such as
'PDF', or 'proceedings'. Hundreds of thousands of scientific papers can be
located quickly in this way, says Lawrence.

ResearchIndex uses simple rules based on the formatting of a document to
extract the title, abstract, author and references of any research paper it
finds. It recognizes the various forms of presenting bibliographies, and by
comparing these with its database of other articles can conduct automatic
citation analyses for all the papers it indexes. This information can also
be used to quickly identify articles related to any indexed paper.

The prototype form of ResearchIndex is being applied to the computer
sciences. Its archive of papers in this subject alone, at 270,000 articles,
is bigger than leading online scientific archives such as the Highwire
Press, which has almost 150,000 articles, and the Los Alamos archive of
physics preprints, which contains about 130,000 papers. The engine already
has an enthusiastic following among computer scientists. Stevan Harnad of
the University of Southampton, who has tested the system on his CogPrints
archive of preprints in the cognitive sciences, is another convert. "For the
literature it covers, it is a gold mine," he says.

NEC is giving the software free to non-commercial users, and Lawrence hopes
it will be applied across many disciplines: "Our goal is to not just create
another digital library of scientific literature, but to provide algorithms,
techniques and software that can be widely used to help improve
communication and progress in science."

----- Original Message -----
From: Simon Buckingham Shum
Sent: Thursday, May 18, 2000
Subject: Re: EPRINTS = PREPRINTS (unrefereed) + POSTPRINTS (refereed)

At 9:21 am +0100 18/5/00, Thomas Krichel wrote:
> The problem with self-archiving by authors is the growing tendency
> of authors to deposit their papers in homepages. It is debatable
> if this sort of activity is real "archiving". What we need is to
> have more agents, acting on behalf of authors that will hopefully
> make more long-term archiving possible. The archiving through
> an agent is what I call "formal" archiving, and I oppose it to
> the tendency of "informal" archiving in homepages. My impression
> is that formal archiving is relatively declining, whereas informal
> archiving is on the increase. I see the OAi as an attempt of formal
> archivers to regain initiative.

Good point. But the problem as we know from other computer science
domains is that people need a good reason to bother to formalize
information for systems - the cost-benefit tradeoff. One would hope
that authors see it in their interest to publish on an OAI server,
for instance, but structuring and submitting bibliographic data is
extra work. For instance, if an author has already submitted a
document to their own organization's report library, they don't want
to have to do it all over again for an eprint archive.

Various options for getting a new document onto a server whilst
minimising the burden on the author suggest themselves:

- author takes responsibility to manually submit document to eprint
server in addition to any other archives

- other archives automatically forward their submissions to eprint archive

- all archives become OAI compliant(!) so no forwarding of
submissions is required

- author's favourite bibliographic management tool (Bib; EndNnote;
etc) uploads details to eprint server, which emails author with URL
to go to form to complete any missing details

- author publishes document citation details on homepage, an
intelligent agent spots and parses this, fills out the eprint server
form as far as possible and emails author with URL to go to form to
complete any missing details

- <dream on>

