Re: access to self archive via google scholar

From: Walker,Thomas J <tjw_at_UFL.EDU>
Date: Sun, 22 Oct 2006 17:42:13 -0400

Those starting an IR should not expect Google to quickly harvest or to
logically rank the journal articles posted on the new IR. This is based
on my recent experience with helping the Florida Center for Library
Automation (FCLA) in their efforts to test the IR waters with
ScholArchive (http://eprints.fcla.edu/ ), a pilot IR that is focused on
the scholarly output of the faculty and graduate students of the
University of Florida's Department of Entomology and Nematology.


ScholArchive (using E-print software) went on line 28 July 2006 with 7
posted articles. Here are excerpts from five emails relevant to their
harvesting and ranking by Google and a report of today's results:

1 Aug 2006 (from ScholArchive Administrator)
"I have registered us with Google, Google Scholar, SCIRUS, ROAR, DOAR,
etc. so we should be indexed very soon by lots of search engines,
hopefully."


1 Sep 2006 (from ScholArchive Administrator)
"I have been monitoring Google Scholar, Google and other discovery sites
for the past 5+weeks since your papers were loaded, with the same
disappointing results, even though I registered ScholArchive with these
sites."


1 Sep 2006 (from Tom Walker to ScholArchive staff)
"This is disappointing because faculty will be more likely to post their
journal articles in ScholArchive IF we can show that doing so will
significantly help Google users find openly accessible full text of the
articles.

To illustrate how this might prove to be the case, consider my 2001
Environmental Entomology article entitled "Butterfly migrations in
Florida: seasonal patterns and long-term changes." This morning I
entered "butterfly migrations in Florida" as a Google search phrase and
got 36 hits (under 11 main listings). Here are the first six main
listings:

1. My personal web site. [A click on Google's listing loaded the PDF
file of the article.]

2. BioOne. [A click on the listing loaded the abstract, but without a
BioOne license the full text would be inaccessible.]

3. Ingenta Connect [A click led to a page with the abstract and a chance
to pay $25 for access to the full text.]

4. TX-BUTTERFLY archives. [A click led to a bibliographic entry that had
a dead link to the PDF file of the article. (My web site's URL was
changed a few years ago)]

5. Journal of the Lepidopterists' Society [A couple of clicks led to a
1993 article on trapping migrating butterflies.]

6. The Entomological Society of America Journals Online [A click led to
the TOC of the issue, another click led to the abstract, and a third led
to the PDF file. But unless someone knew that I had paid ESA to provide
OA for my article, who would have thought that free access to the PDF
file would have been found here?]

BOTTOM LINE: Had I not posted the PDF file on my Web site, very few
would have found free access to the article's full text. Thus it is
important to know how Google will rank the ScholArchive posting.

Incidentally, I ran the same search in Google Scholar BETA and got only
one hit-the same as no. 2 above!"


20 Sep 2006 (from ScholArchive Administrator)
"As it turns out, Google is indeed indexing our site, but only the
top-level pages, not the papers inside the repository. I am working on
how this can be changed."


9 Oct 2006 (from Tom Walker to ScholArchive staff)
"Yesterday I checked Google to see if the ScholArchive version of my
butterfly migration paper had been harvested. It had not, and worse,
the order of the sites that offered it had been changed. My (free)
offering of the paper on my home page was now the fourth of the main
listings (instead of first). Two for-fee offerings were first and third
and BioOne was second."


22 Oct 2006
When I searched Google this afternoon for the butterfly migration paper,
the for-fee sites that had been No. 1 and No. 3 now occupied main
entries No. 1 and 2 in the search results. Howerver,my homepage site
(free) was now No. 3 and the posting on ScholArchive (free) was now No.
4. BioOne had dropped to No. 5.

The current ranking is still a disappointment but better than on 9 Oct
(and worse than 1 Sep).

[On Google Scholar (beta), the BioOne posting was all that was offered.]

Tom

====================================
Thomas J. Walker
Department of Entomology & Nematology
PO Box 110620 (or Natural Area Drive)
University of Florida, Gainesville, FL 32611-0620
E-mail: tjw_at_ufl.edu
FAX: (352)392-0190
Web: http://tjwalker.ifas.ufl.edu
====================================


-----Original Message-----
From: American Scientist Open Access Forum
[mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG] On
Behalf Of Donat Agosti
Sent: Saturday, October 21, 2006 1:54 AM
To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
Subject: access to self archive via google scholar

What would it need that self archives could be indexed by google
scholar, so that those articles could be found

search for example for

"viaticus was tridecane"

Then you end up in this paper
http://scholar.google.com/scholar?q=viaticus+was+tridecane&hl=en&lr=&btn
G=Se
arch

It is it the original paper, which is copyrighted, and there is not hint
that the paper is actually also on ZORA open access.

http://www.zora.unizh.ch/zora/handle/2379/4727?mode=full&submit_simple=S
how+
full+item+record

Ideally, it should show up, since then it would be more often used



Donat

Dr. Donat Agosti
Science Consultant
Research Associate, American Museum of Natural History and Naturmuseum
der Burgergemeinde Bern
Email: agosti_at_amnh.org
Web: http://antbase.org
Blog: http://biodivcontext.blogspot.com/
Skype: agostileu
CV
Current Location
Dalmaziquai 45
3005 Bern
Switzerland
+41-31-351 7152
Received on Mon Oct 23 2006 - 11:06:18 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:32 GMT