Re: Nature launches web debate "Future e-access to the primary literature"

From: David Goodman <dgoodman_at_PRINCETON.EDU>
Date: Thu, 6 Sep 2001 13:07:35 -0400

I am writing as an outsider and would like to point out some apparent
confusion about minimal necessary identifiers, at least as proposed

What is called "minimal retrieval metadata" is ambiguous for any large
set of material. It will do for books in a bookshop, or a limited
collection of similar online items. All three elements of it fail, as
soon as it is realized that neither author names nor titles are unique,
that some documents do not have an author, or a title, or a year, and
that more than one version of the same item may be published during the
same year. It won't even do to retrieve e-mail on my own computer. I
have hundreds of cases where the same person or group has sent me
several different items with identical titles on the same day (not to
mention year).

What is called "minimal bibliographic metadata" applies only to articles
in conventional journals. There are many other types of bibliographic
items. To begin to see the possibilities involved, look at the
instructions for Endnote or a similar bibliographic utility intended for

Minimal 'ontological' metadata (effective, community-agreed vocabularies
of subject descriptors), is an unattainable goal. No such scheme can
be constructed that will not require continuous maintenance and
re-assignment of descriptors, and none ever has. The best known and one
of the very best in general is the Medline indexing vocabulary: they
re-index every year. Even if it were attainable, one would have to
describe the relevant "community". Just try that in an unambiguous way.

> From the standpoint of a librarian, all this is naive in the extreme. We
have constructed cataloging rules that permit unambiguous description;
unfortunately, they can only be fully understood and applied by a
professional cataloger whose career is devoted to it. Ordinary
librarians such as myself understand only a few relevant portions. We
have never succeeded in even finding fully satisfactory search and
display subsets which the average patron can use over the full range of
material. Try any on-line catalog of your choice, and look at it
critically. If you find one which you think does everything, let me (or
any librarian) know, and you'll get plenty of examples of what it won't

To the limited extent I understand metadata proposals I have seen, the
same applies: their full understanding is a full time job for a
dedicated expert. I hope the group working on it manages to find a
suitable way of letting non-experts construct and use it. If I ever see
one that I can use and understand, I'll let people know.

is no longer a functional site. Yes, we do need identifiers.

Leslie Carr wrote:

> On Wed, 5 Sep 2001, Declan Butler wrote:
> > As metadata are expensive to create - it is estimated that tagging
> > papers with even minimal metadata can add as much as 40% to costs
> For what purpose is the metadata? Minimal retrieval metadata (title, author,
> date) is different from minimal bibliographic metadata (journal, volume,
> issue, page range) which is certainly different from minimal 'ontological'
> metadata (effective, community-agreed vocabularies of subject descriptors).


Received on Wed Jan 03 2001 - 19:17:43 GMT

