Re: Interoperability - subject classification/terminology

From: David Goodman <dgoodman_at_PHOENIX.PRINCETON.EDU>
Date: Fri, 7 Mar 2003 21:53:20 -0500

The matter is not quite so simple, Stevan. I agree that a
decentralized archive, as distinguished from arXiV, does not need
much in the way of classification--especially if its
classification will be different from that of every other such
archive. I suspect the practical access for the immediate future will be
by known author, supplemented by the citation network.

On the other hand , to rely on OAI harvesters and automated search tools
for accessing the union of all such collections is premature.
I am not certain whether it is within human capabilities to design
this--certainly none of the extensive efforts at automatic document
retrieval are really adequate--it's a problem of the same magnitude
as AI in general. I would love to see this solved, of course, because the
known manual methods, as they are applied in libraries and indexing services,
are almost equally unsatisfactory.
(All the above is a 3-sentence summary of decades of work of many good
researchers, and I am the first to admit that it is an inexpert summary at that)

But it does see that we will be adopting a policy of getting it all
accumulated, and hoping that the next (intellectual) generation will be
smart enough to get it organized.
It should be obvious that this is not an argument against making our
material available, which must be done while
the material can still be captured.

On Fri, 7 Mar
2003, Stevan Harnad wrote:

> [Thread: ]
> I agree 100% with the point made by the commentator below:
> Institutional Eprint Archives for refereed research papers do *not*
> require an elaborate classification system (such as Library of
> Congress). These are not books. And the OAI harvesters and search
> engines will be the real, cross-archive search tools; elaborate
> pre-classification is not needed just for searching within one's own
> university's local research output (and creating such an elaborate
> classification system is, in my opinion, a waste of time). (And in any
> case, I would put my money on boolean inverted full-text search, with
> scientometric impact ranking, over any prefabricated human taxonomy in
> this online age.)
> Reply to comment below: Just pick in one default subject and forget
> about the rest.
> Stevan Harnad
> On Fri, 7 Mar 2003, W F Clocksin wrote:
> > Hi. I am a beginning user of Eprints, and am entering metadata on the
> > default test archive interface. It is a real nuisance to have to
> > specify the Subject (which uses the Library of Congress system). For
> > books this makes sense because the catalog information is in the front
> > matter of the book, but it is unclear to me why I should have to do
> > this for journal articles. For multidisciplinary articles, it might
> > mean specifying a number of Subjects using the scrolling textbox, which
> > would take longer than copy/pasting the rest of the metadata. I would
> > rather just leave out the Subject. To what extent is a required Subject
> > built into ePrints, or is it simply feature of the test interface that
> > I could omit from a custom interface?
> >
> > William Clocksin
> >

Dr. David Goodman

Princeton University Library
Palmer School of Library and Information Science, LIU
Received on Sat Mar 08 2003 - 02:53:20 GMT

