Re: Interoperability - subject classification/terminology

From: Stevan Harnad <>
Date: Mon, 25 Nov 2002 02:49:10 +0000

On Sun, 24 Nov 2002, Thomas Krichel wrote:

>sh> (2) The University Eprint Archive as a means of providing open access
>sh> to all of the university's peer-reviewed research output (before and
>sh> after peer review). Almost without exception, this is the work that
>sh> also appears in the peer-reviewed journals sooner or later (indeed,
>sh> that is how it gets peer-reviewed).
>sh> It should be clear that (2) is a very special subset of (1). But
>sh> it should be equally clear that that special subset does not have any
>sh> particular or pressing classification problem!
> I beg to differ. Scholars are subject to herd behavior. You will not
> get scholars to deposit papers in the local archive if their colleagues
> in other universities don't do it.


> Thus you have to approach scholars by community.


> To do that, you need to classify the
> material that you have per discipline,

You just lost me! Isn't a university a scholarly community? Moreover,
the scholar's university is a scholarly community with which the scholar
shares some rather vital interests: They employ the scholar, the scholar's
research funding pays some of their overhead, hence they have a shared
interest in each of their scholars' maximizing their research impact.

There is no such shared interest with a "discipline," distributed
worldwide. (If anything, there is competition for impact within a

> in order to build
> discipline-specific aggregators, such as the (pioneering)
> RePEc project for economics.

I admire RePEc, and apologize for having failed to
mention it, along with ArXiv, ResearchIndex, and the Institutional
Eprint Archives. RePEc is a very valuable and important contributor to
open access and self-archiving (although it is not all full-text and not
all open-access). It is a collaborative effort among institutions.

But it is not at all clear that as institutional self-archiving
(in all university departments and disciplines) gains momentum there
will be any need for classification (though it would not hurt to
tag papers from economics departments "economics" too, especially
for preprints). The classification will be amply accomplished by the
journal-names along with boolean search through the inverted indices
of the articles' titles, keywords, and full-texts. When the time comes,
a master-classification of the planet's 20,000 peer-reviewed journals
can be added as a supplement, along with any further taxonomies that
analyses of the inverted full-text corpus itself generate, supplemented
by citation and co-citation analyses and other scientometric goodies.

>sh> can beat google-style boolean search on an inverted full-text index,
>sh> especially if aided by citation-frequency, hit-based, recency-based,
>sh> or relevance-based ranking of search output, as done, for example,
>sh> by ).
> Yes but all those services require discipline based,
> relational dataset to be precise.

google? a discipline-based relational dataset?


