Re: Search Engine for Repositories Only?

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Wed, 2 Aug 2006 13:57:26 +0100

On Wed, 2 Aug 2006, Philip Hunter wrote:

> Stevan Harnad wrote:
> >
> > (2) Why build these national and local restricted range OAI search engines
> > instead of simply building restricted-range options into the
> > full-spectrum OAI search engines such as OAIster?
> > http://oaister.umdl.umich.edu/o/oaister/
> >
> There are a number of reasons why you might want national and restricted
> range OAI search engines as part of a global infrastructure for repository
> usage, including questions of quality assurance and the practical management
> of repository networks.

I don't understand. If you have T total local IRs harvested by a
full-spectrum OAI harvesting/searching service such as OAIster, and Q
of them meet your local or national quality standards, then why wouldn't
simply restricting to that Q subset of T be the local quality assurer?

And how do global or local OAI searcher/harvesters affect the practical
management of the local IRs themselves?

Or is it that there are specific search functionalities that the national or local
search engines would provide that global ones cannot or do not provide?

> This question came up at the WWW2006 JISC workshop
> on repositories, where I suggested that global services might be built most
> practically on the basis of locally developed and managed services.

The rationale for the OAI protocol has been to develop global OAI services on top of
distributed local OAI data-providers (IRs) -- not on top of distributed local OAI
services.

The latter is possible too, but I would be keen to know the concrete
functional objective of doing it that way.

> The institutional and geographic level levels of repository services which,
> while of little or no interest to the user, offer a number features which
> can support the quality and sustainability of global OAI search services.

I am not saying that there is no possible functional advantage there: I am just
saying I have not yet heard what it is, concretely. What are the institutional and
national OAI search engines meant to do that the global ones do not or cannot do?

> This is one of those quasi-theological issues which are often quite
> divisive - do we work from the centre (full-spectrum OAI search engines,
> harvesting local services directly, with restricted range search options),
> or do we build global services using a tiered structure (restricted range
> OAI search engines whose aggregated records are globally harvested)?

It is only theological if we are not specific about exactly what concrete
functionality we have in mind. On the face of it, the OAI picture
is: distributed local OAI data-providers (IRs), plus global OAI
service-providers providing services on top of those local IRs. There
can of course be OAI services on top of OAI services too, but with
search-services in particular, it is not obvious what sorts of things
the subglobal ones would be doing that the global ones would/could not.

Please help me see!

> We might have to suck it and see. The choice also depends to some extent on
> what the world at large thinks repositories are for - a matter clearly still
> in flux.

That might be the gist of it: There are those who think IRs are for
digital content management and preservation, and those who think IRs
are for maximizing research access-provision. It might be helpful to
distinguish OAI DL IRs (OAI-compliant Digital-Library IRs, for digital
content management and preservation) from OAI OA IRs (OAI-compliant
Open- Access IRs, for providing research access). What the requisite
search services and functionalities might be, and be for, may then look
quite different for the two kinds of IRs.

(Very similar questions underlie the [what should likewise be functional
rather than theological] issues surrounding the question of central
versus local repositories: CRs vs. IRs. And again it depends on what
you want them for, and what you want them to do, how.)

(To go still further: In principle, the harvester of all harvesters is
google, and it harvests and inverts much of web content already. What
sets the OAI harvesters apart is that (1) they focus on a specific kind
of content, not all of webspace, and (2) they use the OAI tags. But of
course google could be restricted to that subset, and configured to allow
navigation based on the OAI tags too! Google Scholar is already going
in that direction. With a full-text harvester already trawling the net,
does OA/OAI really have to reduplicate the efforts? This is not a rhetorical
question, but a practical, functional one.)

Stevan Harnad
Received on Wed Aug 02 2006 - 14:16:32 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:27 GMT