Re: A Search Engine for Searching Across Distributed Eprint Archives

From: Donat Agosti <>
Date: Thu, 21 Oct 2004 16:20:39 +0100

> It is merely distraction and dreaming to worry about search tools when
> the OA content is not yet there for them to search!

+++ for our little brave world of ant systematics (covering ca 11,700
species) we have all we can online accessible, that is > 70,000 pages in
pdf, both single and bound pdf accomplished with support of the
Smithsonian Institution and the American Museum of Natural History as
main supporters. The pages cover ca 3,500 publications dating from 1758
(see for example for "Camponotus herculeanus" in taxon search at to today. They are linked into our catalogue of ants (and
another ca 130,000 names of bees, wasps, etc., so that from every
citation the original publication can be read.

Together with UMass Boston, the American Museum of Natural History, Ohio
State University and University of Magdeburg and supported by NSF/DFG,
we are developing a standard to convert those legacy publication and
mark their content up using XML schema or similar tools.
The problem, besides technical issues, is, as you make it clear, to
assure that we do have access to recent publications. For example, for
2003, 423 (!) new ant species have been described, of which only the
original description of 8 (!) species are open access. Interestingly,
this includes also a publication by Harvard University Press by
E.O.Wilson, the single largest producer of new species description of
2003. A university publisher which I would assume should be at the
forefront of open access to serve the scientific community.

This somewhat pioneer project seems to have inspired other research
groups to begin build up an archive of publications covering their taxa
(groups of plants and ants). Independently of that, research
institutions such as the American Museum of Natural History are
beginning to open up their archives by transferring the legacy
publications into digital publications, and hopefully entirely open

It is still a far way to go to get all the publications of the ca
1,500,000 species, each with an (ant) average of 6 pages, thus 9,000,000

But I believe, we will only get the money to transfer this huge amount
of legacy data, if we can provide the best possible access to its body -
thus excuse of thinking ahead.

Donat Agosti
Received on Thu Oct 21 2004 - 16:20:39 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:38 GMT