OpenDOAR Search vs Google Search

From: Leslie Carr <lac_at_ecs.soton.ac.uk>
Date: Wed, 1 Nov 2006 15:13:46 +0000

In response to the challenge laid down by Andy Powell, I have had a
closer examination of the queries that people have historically used
on our repository (eprints.ecs.soton.ac.uk) and how they now perform.
In particular, I looked at whether vanilla Google or the Google
Custom Search Engine that implements the OpenDOAR search ranked our
repository highest and/or put more ECS results on the first page of
the query response (up to 100 results).

I checked 385 historical queries, extracted from our repository logs
from 18 months ago. I know that these queries successfully delivered
people to our site in the past, but I have no information about our
search rankings at that time. The purpose of the study was simply to
compare vanilla Google and OpenDOAR Search now, using a realistic set
of query terms.

I submitted each query to both search engines and saved the HTML
result page (using AppleScript to control my Safari Web browser). The
834 HTML pages (417 queries * 2 search engines) were later analysed
for the ordering of the query results (using the normal UNIX shell
tools in bespoke scripts). The results are given below in two groups:
the first group looking for appearances of our repository and then
the second group looking for appearances of any repository.

MY CONCLUSIONS: as a repository manager, OpenDOAR Search provides
significantly more prominence for my holdings than Google does.
However, as a researcher, I do not know whether the things that are
in repositories are intrinsically more interesting that the things
that Google rates highly! If anyone is interested in following THAT
question up, I can provide the 385 queries and the associated Google
and OpenDOAR Search pages.
--
Les
FIRST GROUP: APPEARANCES OF EPRINTS.ECS
========================================
QUERIES WITH AT LEAST one hit for ECS from the first page of the  
query results (up to 100 items per page)
Google: 69%
OpenDOAR Search: 94%
THE AVERAGE POSITION of the first ECS hit across all the queries:
Google: 10.2
OpenDOAR: 3.2
(result pages with no ECS hits are ignored - see above result).
SEARCH ENGINE WITH HIGHEST POSITION of first ECS item in search list:
Google: 11 times
OpenDOAR: 293 times
Tied Equal First: 81 times
(where 'highest' = closest to rank #1).
AVERAGE NUMBER OF ECS hits on the result page:
Google: 1.2
OpenDOAR: 3.4
SECOND GROUP: APPEARANCES OF ANY REPOSITIRY
========================================
QUERIES WITH AT LEAST ONE REPOSITORY featuring in the results page
Google: 87.3%
OpenDOAR: untested - assumed 100%
The ROAR repository list was used because it is data that I have easy  
access to. Sorry Bill!
A search result qualifies as a repository item if its host name is  
the same as the host name of a repository.
AVERAGE NUMBER OF REPOSITORY items per results page
Google: 4.3
OpenDOAR: untested - assumed 100%
AVERAGE POSITION OF FIRST REPOSITORY ITEM on the results page
Google: 12.6
OpenDOAR: untested - assumed 1
(Results pages with no repository items are ignored.)
NUMBER OF RESULTS PAGES IN WHICH A REPOSITORY ITEM GAINS #1 POSITION
Google: 28%
OpenDOAR: untested - assumed 100%
Received on Wed Nov 01 2006 - 16:00:47 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:34 GMT