Re: Cliff Lynch on Institutional Archives from David Goodman on 2003-03-17 (American-Scientist-Open-Access-Forum)

From: David Goodman <dgoodman_at_PHOENIX.PRINCETON.EDU>
Date: Sun, 16 Mar 2003 21:14:01 -0500

OK, Stevan, you mentioned biology, so I imagine you expect a reply from
me.

I agree with you that ecology -- in combination with other terms -- is a
sufficient descriptor. The evidence for this is the not very helpful
separate attempts of both
Biological Abstracts and Zoological Record to divide the area into more
specific terms. Free text and citations is indeed the only way to deal
with it, and that is what I teach my students.
This is however, somewhat truer of all areas where what one is trying to
index is concepts or approaches, rather than discrete things. I can
easily provide good subject terms for not just things with
standardized terminaology like animals, but for machine parts, or clothing,
or or even emotions. But I cannot do so for mechanical operations, of
programming principles, or ethical ideas: they do not have discrete
boundaries and are used in unpredictable ways.

The harder classification problem is when you deal with, say, the sort of
ecology that mathematical ecologists do. If they are primarily mathematicians,
they may not even be aware they are talking about biology, or to what
biological fields or problems their techniques are applicable.
In practice, this material is initially discovered only through
serendipity, and then the knowledge is dispersed through the citation
network.

And obviously the conventional journal system is not of value
here, so I do not think we disagree. I write to expand on what you said,
not contradict it.

On Sun, 16 Mar 2003, Stevan Harnad
wrote:

> On Sun, 16 Mar 2003, Lee Miller wrote:
>
> > The simplest way to aggregate papers within disciplines would be include a
> > discipline field in the metadata.
>
> I agree. And this confirms that "aggregation" is merely (1) a
> metadata-based from of re-packaging and (2) need not re-package the
> full-text but merely the pointers to it. Hence it is not the case that the
> (full-text) *data* from distributed Institutional OAI Archives need to be
> "fed" (harvested) into central Disciplinary OAI Archives. "Aggregation"
> is merely a special case (or rather a special name) for ordinary OAI
> Service-Provision -- which is precisely what the OAI Metadata Harvesting
> Protocol was designed for! The old paper-based idea of journal-content
> "aggregators" is simply misleading us here. Online "aggregators" are
> really just search engines, pulling out and ranging over
> discipline-specific subsets of OAI full-text content space.
>
> > This gets back to the problems of subject
> > classification, but at the discipline level a short list of defined
> > discipline descriptors should be sufficient.
>
> A *very* short list. Because once I have narrowed it to "Ecology," the
> rest is best done with boolean full-text search and algorithms rather than
> prefabricated human classification schemes.
>
> > For example, the discipline of ecology includes plants, animals,
> > microorganisms, terrestrial and aquatic ecosystems, physical environments,
> > physiology, applied mathematics, and many other sub-fields. Nevertheless,
> > ecologists of all stripes recognize and enjoy common bonds in the general
> > discipline. A small number of general journals that publish papers from
> > many of the sub-disciplines are followed by many researchers and academics,
> > regardless of their specialty fields. Thus inclusion of the discipline
> > desciptor "ecology" would allow aggregation of papers at a level that has
> > already proved useful to ecologists for over a century.
>
> No problem. But how many such high-level (useful) partitions do you think
> there really are, within, say, "Biology"? I suspect we are talking about
> a very small number; the rest is boolean content-based search. (Besides,
> it is not just *journals* we are classifying, as in the old aggregator
> days, but *papers*.)
>
> > A similar level of aggregation in other fields would surely be useful as a
> > tool for harvesting papers of particular interest from institutional archives.
>
> I suspect that these high-level, a-priori categories will be similarly
> sparse in all disciplines: There is no pre-classification needed much
> beyond the level of the discipline-name itself. We are not sorting
> journals any more; we are searching open-access contents. There will be
> powerful content-based algorithms for narrowing it to the kinds of
> material we want, but little of it will resemble how we used to
> aggregate journals.
>
> Stevan Harnad
>

Dr. David Goodman

Princeton University Library
and
Palmer School of Library and Information Science, LIU

dgoodman_at_princeton.edu
Received on Mon Mar 17 2003 - 02:14:01 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:55 GMT