I've looked at this question of the potential scholarly uses of Napster like
systems. Please find below for your interest an article I published on the
subject in the 13th April issue of Nature.
13 April 2000 Nature 404, 694 (2000) Macmillan Publishers Ltd.

Music software to come to genome aid?



Student use of a new 'killer' Internet application that allows anyone
connected to the web to share music files stored on the hard disk of their
own computer has become so heavy that US campuses want to ban the software
for fear that it will saturate their academic Internet connections.

But some scientists are thinking of adopting the principles behind the
so-called 'Napster' technology themselves. They believe these could herald a
new era in distributed computing, and in particular solve the thorny problem
of how the vast community of biologists can collaborate on assigning
functions to the genes in the human genome.

Anyone connected to the Napster software, which can be downloaded from the
Internet (, can do a single search for a song across
all the hard disks of other Napster users and download it directly from the
user's computer.

When Lincoln Stein, a bioinformaticist at the Cold Spring Harbor Laboratory
in New York, heard about Napster, he was struck by the parallels with his
own work on writing software for a distributed sequence annotation system
for the human genome (see Napster, he realized,
can be used to find and distribute information located anywhere on the

Stein believes that annotation, which involves predicting which sequence
stretches are genes and what their function might be, calls for a radically
different global database structure from the centralized system used for
gene- and protein-sequence data. At present, users submit and retrieve
records to a few central databases, such as GenBank.

But many scientists believe that annotation is too large a task for a few
large genome centres. It is more of an art than a science, and up to half
the predictions are wrong. No single group is likely to be able to produce a
definitive version.

Annotation centres could in principle contribute data to GenBank-like
centres. But GenBank entries, which can be modified only by those who
submitted them, are sometimes erroneous. The problem is likely to be worse
with the more subjective annotation data; the creation of a single
authorative annotated sequence seems unlikely.

A better solution, Stein argues, might be to allow biologists worldwide to
annotate the human genome sequence interactively using diverse computational
and experimental methods, much as developers worldwide debug open-source

But until now this sort of decentralized solution raised the spectre of
duplication of effort, and the risk that scientists, instead of being able
to consult a single central database, would have to search a series of
separate versions of human genome databases. Data integration would become a
real problem.

Stein believes Napster-like technology could be the answer. A centralized
reference server holding a detailed genome map would act as an anchor for
data produced locally by third-party annotation servers. Researchers could
publish their data electronically without having to maintain their own

Such a system could avoid the need to label entries with subjective
identifiers, such as gene names or accession numbers, as occurs now.
Instead, maps, gene predictions and functional activities could all be
superimposed on the reference map using a system of coordinates, much as
astronomers combine multiple-wavelength data with positional or coordinate
information. A user could zoom in on any part of the genome and view the
region in many ways by calling up related data from the hard disks of
participating laboratories.

Ewan Birney, joint head of Ensembl, a joint venture between the European
Bioinformatics Institute and the Sanger Centre in Cambridge to develop an
automatic annotation on eukaryotic genomes, is backing Stein's idea, and has
proposed that Ensembl be used as the reference map. The US National Center
for Biotechnology Information, however, apparently opposes the idea on the
grounds that it would lead to a proliferation of junk.

That risk is real. Although few genome scientists admit it publicly, the
quality control of many smaller laboratories does not match that of
specialized centres. "It's a Catch 22," says Birney. "We want to democratize
people's ability to present their work, but quality is a problem." He
believes one answer would be to offer users a full interactive choice
between an approved annotation, composed of data from recognized
gold-standard laboratories, and the full data.

Few researchers are willing to publicly endorse Napster-like technology. The
idea of leaving desktop hard disks open to the Internet is a network
manager's nightmare, and is not helped by Napster's ability to scan
firewalls and breach weaknesses.

But Birney insists that developing the system should be taken seriously. "If
we don't get it right it will be the difference between a biological web and
people just continuing to use the Internet to send e-mails and read web

jr> Though what has been said about Napster is certainly relevant, I don't
jr> think the import of it for self-archiving of one's professional work,
jr> published or pre-print, has quite come into focus for us here. Let us
jr> leave aside the use of it to pirate music, which is a red herring
jr> relative to the concerns of this forum.
It is not a red herring in one essential respect: There are many people
who currently oppose open archiving of refereed research because they
think it is a form of theft: There are university administrators who
think this (feeling the pressure of the serials crisis, but
understandably not wishing to relieve it illegally); there are
librarians who think this; there are publishers who think this; and
there are authors who think this.
The primary motivation and use of Napster for consumer-end piracy simply
reinforces this false impression, which is still holding us all back
from the optimal and the inevitable for research. It is for this reason
that it is so important to make it clear that author self-archiving is
NOT a form of consumer-end piracy at all; it is a producer-end
give-away, and as such, it does not need Napster-like tricks for
distribution. It can and should be done perfectly up-front and legally
by authors on the Web itself. No need for "second economy" bootleg
links between users' PC's: Just proudly self-archive your own refereed
work on your own institutional Open Archive or a central one.
Interoperability will take care of the rest; and consumers will be able
to get your give-away product perfectly legally, and without the need
of any "second network."
Professor Randsell goes on to make further suggestions for Napster-style
distribution, again failing to take the difference between consumer-end
rip-off and producer-end give-away into account. For when it is
producer-end give-away, there is no need for a "second network" or
directly connected computers (with all the attendant needless risks and
vulnerabilities). The good old WWW will do fine.
jr> What makes [Napster] relevant here is its potentialities as a
jr> communications technology that can be used to defeat reactionary
jr> intellectual property practices.
Via consumer rip-off or producer give-away? Is there any reason
whatsoever that the latter should make common cause with the former?
The Net was built in the spirit of shareware, but now that the entire
economy is moving onto the Net, it is just as absurd that the Net
should (quixotically, and chaotically) try to impose the give-away
model on all of Trade, as that the Trade model should now be imposed on
all of the Net. Let 1000 flowers bloom.
The teenage and post-teenage hackers who craft the likes of "Gnutella"
in the hopes of freeing the Golden Goose-Eggs (about whose exact
provenance they are blissfully murky) for one and all, Napster-style,
would simply kill the Golden Goose if they prevailed unchecked. This is
a classical "evolutionarily unstable strategy," in which cheaters
eventually deplete the resource they exploit. There is no reason
whatsoever to link the rational, right, and reachable goal of freeing
the refereed research literature to this sort of murky myopia in any
jr> one advantage it offers that is not accommodated by the public archives
jr> in process of construction at present is that one can make publicly
jr> available many different kinds of resource material in addition to
jr> scholarly or scientific research reports proper...
jr> it could make easily available scholarly and investigative tools of
jr> the sort which heretofore have always perished with those individuals
jr> who devised them.
Are we talking about consumer rip-off or producer give-away? If the
latter, what's wrong with doing it publicly on the Web? (And the Open
Archive interoperability can and will easily be extended to other forms
of give-away too, not just research reports.)
jr> Would people actually be willing to share their research instruments
jr> and materials in that way[?]
Reasonable question. Where the answer is "Yes," the course is clear
(and does not require a second, Napster-style Net: the first one will
do). Where the answer is "No," we are talking about theft rather than
give-away (and most of us will want to just walk-away from that).
jr> this Napster-like technology could yield a distributed archival
jr> database which could easily grow... [but would] have to remain distinct
jr> from the database of e-prints currently envisaged because of its highly
jr> fluid character, owing to its dependence on the willingness of
jr> individuals not only to keep on making the materials available but also
jr> to follow routine practices in revision of their work and in the
jr> development of their personal instruments of research.
It is not at all clear why all of this (if it's legal give-aways)
cannot be done within the Open Archives framework.
jr> the value of it relative to the aims of the present forum could only
jr> lie in its side-effect of tending to encourage self-archiving of the
jr> stable sort wanted here.
On the contrary, any association with Napster-style consumer fraud can
only have the side-effect of retarding open archiving's entirely
ethical mandate.
jr> To use one of Stevan's favorite metaphors, if the horses, being shown
jr> the water, continue to be reluctant to drink, it could be because of
jr> inhibitions that can only be addressed in other ways than those that
jr> suggest themselves when one thinks of the problem of open publication
jr> only in the simplistic and highly abstract way it is usually described
jr> here.
On the contrary. Inhibitions about self-archiving are based on the
unfounded fear that it may be wrong or illegal; gratuitously linking
it to something that may indeed be wrong and illegal hardly helps.
The algorithm is indeed simple, but hardly abstract:
    Researchers' refereed research reports are give-aways; researchers
    should accordingly self-archive them online, free for all. No need
    for a Napster-style "second economy" to do this: Open archiving
    will do it for you (and for any other research-related things you
    may wish to give away too).
