Re: Meeting: National Policies on Open Access Provision for University Research Output

From: Stevan Harnad <>
Date: Thu, 4 Mar 2004 22:30:49 +0000

I am not sure there is really any substantive disagreement with Peter
Murray-Rust, though there might possibly be on just one point -- a
point on which Michael Eisen has expressed the same point as Peter
Murray-Rust. I disagree, so perhaps it would be useful if a legally
informed (but also *realistic*) expert opinion could be provided:

Here are the matters at issue. Peter and Michael are both concerned
that toll-free access to the full-text of journal articles is not
enough, because one may wish to re-use -- not re-publish, but re-use
-- the data published in those articles (not data published in separate
databases, primary or secondary, author-provided or third-party-provided:
data in those articles whose full-texts are accessible toll-free).

As an example, Peter gives the datum: "melting point = 123
deg." contained in the article's text or tables.

Now my question: In the Gutenberg age, when the article appeared on
paper, and the user had to read it, and then do his next experiment,
using that datum, and then publish results (say, computations) based
on it, was there ever any doubt that he could use that number, perform
whatever computations he liked on it, and then report both the original
author's finding (citing the reference) and his own result based on it?
Was it ever the case that *published* scientific results could be read,
but not used and built upon (without requiring any further permission)?

I have certainly never heard of such a thing. I and many others have
used and built upon the findings reported by others, whether they were
described verbally or numerically (and reported at a conference orally,
or published on paper, or online). I would not even know how to make a
distinction between a verbal and a numerical datum! If you report
that the first group performed better than the second group, is that
published ordinal datum something I may not refer to and build upon in
my further work? Are ordinal, or even nominal and qualitative data less
data than cardinal data? Is a mathematical proof, published in a journal,
something I may read and admire, but not use and build upon? Is a fact --
be it geographic, historical, or sociological, stated in words, readable
but not usable? How can we possibly even keep track of where and how we
have used all the facts we have read and used?

So my tentative conclusion is that the *content* of any published
research article can be used; indeed, that's why it's published: so it
can be used and built upon by other researchers. And I also tentatively
conclude that there is no principled distinction between a "datum" and
any other piece of content.

Verbatim text, on the other hand, is form, not content, and that may not
be re-used as one's own text (but that is really re-publication and not
just re-use, and Peter explicitly says it is not re-publication he is
concerned about but re-use). It may be quoted (within limits) -- and,
these days, if it is OA, it may be linked, without limit. For content,
as opposed to form, the only constraints are those of plagiarism and
priority: You may not claim the content was your own, if you saw it
somewhere else first.

None of this has anything to do with OA. Nor does the content of
databases provided by secondary-providers, if they own the data
somehow, and want to sell and control not only access but use. The
OA movement of course favors opening both access and use for such
secondary proprietary data too, but that is not within the power of
the OA movement, because the OA movement is based only on the primary
research literature, the one that its own authors provide for free,
and publish in refereed journals.

The OA movement can of course lend moral support to secondary data-base
re-use rights, but it seems to me it does far more good by actually
seeing to it that open access is provided for the full-texts containing
the author's own primary data. One is just moral support for use; the
other is practical provision for use.

What must not be allowed to happen, though, is for the clearcut,
unencumbered path to primary OA and usage to be weighted down or held
back by the need to renegotiate rights or to rewrite laws, with either
primary publishers or secondary ones. The road to 100% OA for the full
content of the primary full-texts is completely free and clear, but it
is not yet fully understood, and hence it is highly underused. What is
needed urgently, today, is for *that* underuse to be remedied, urgently,
today -- *not* for that free and clear road to be encumbered with the
need to renegotiate rights or rewrite laws, in any way. That would simply
add, needlessly, to confusion and delay, when it is *action* that is
already feasible and indeed long overdue.

I cannot imagine that if researchers at last take the path of OA for
the full contents of all of their articles (by publishing them in an
OA journal whenever a suitable one is available and affordable, and
otherwise publishing them in a conventional TA journal but also providing
OA to them by self-archiving them) that this will fail to provide most
of the solution to the data-access problem. (The second step would be
for authors to also self-archive the data that they have *not* had the
space to publish in the journal article itself.)

Both those sets of data would be 100% available for all uses that other
researchers would care to make with them.

On Fri, 27 Feb 2004, Peter Murray-Rust wrote:

> ** In this mail I am NOT talking about data collections however published.
> I am restricting myself to data ("facts") which occur **in the body of
> the final published manuscript** Though I have a wider agenda, in this
> mail I am sticking precisely to the peer-reviewed primary literature.

Understood. Then what would be the problem if the full contents of all
those final published manuscripts could be accessed by all would-be
users on the web, instead of having to pay tolls to access them? What
uses would not be possible that way?

> In some disciplines data are published separately from the manuscript. In
> others (chemistry, biosciences, ...) the data are often only ever published
> in the primary publication (I call this micropublication of data). Typical
> phrases are:
> MeltingPoint 123 degC
> Boiling Point (1 atm) 234 degC
> Yield of reaction: 77%
> etc.
> These data are of great value to the community and are *re-used*. (Not
> synonymous with republication). They may be aggregated, compared, input
> into programs, used to create predictive models, etc. Facts have been
> abstracted from the literature for 150 years and are IMO covered by the
> Berne convention - they are copyright free. If I want to copy and publish
> all melting points in the literature I can. However in many disciplines
> there is a large and inefficient secondary publishing industry.

It sounds as if Peter agrees that 100% OA for the primary research
literature allows all the forms of re-use researchers need. But now we
seem to be heading into secondary publishing, which is *not* what the OA
movement is about. Moreover, it is hard to imagine what the secondary
publishers would be selling, if all of the primary literature were OA
(and OAI-compliant and interoperable). Surely not just *those* data all
over again? Well what data, then? The further data that did not fit
in the journal article? But then why would the self-archiving authors
who have taken the trouble to provide OA for their articles now choose
to give the data to a secondary publisher (to sell both access and usage
rights to it) instead of just self-archiving it too?

The OA movement is certainly also encouraging authors to self-archive
their data too, and not just their articles. But it cannot and should
not make the self-archiving of the unpublished data a *condition* for
self-archiving the article! Beggars can hardly be choosers, and so
far still far too authors are doing self-archiving. We need more of
self-archiving. Demanding of currently *non-self-archiving* authors
(the vast majority) that they do *more* than just self-archiving is
hardly a way of inducing them to self-archive!

> It's important to stress that there is a critical need for
> machine-readability of articles. This gives vast improvements in indexing,
> recovery, aggregation, etc.

With OA we have that. So?

> However when the facts are in eForm including in the 95% green form
> it is highly questionable whether they can be re-used on a significant
> scale due to the European directive on copyright. Both data within
> an article (e.g. a table) and aggregated in journals can be called a
> database and hence copyright of publisher.

Could we take this one step at a time? If the articles are on paper,
and access to them is toll (subscription) based, can (paying) readers
use their contents or can they not? (You seemed to say earlier that
they could.)

Now if the articles are online, but access is still toll-based, can
(paying) readers use their contents or can they not? (If it was ok on
paper, it seems odd to imagine why/how the very same thing becomes not-ok

But so far, that use is still just for the happy few whose institutions
can afford to pay for their access.

Now if the the articles are online and *open access* -- i.e., accessible
toll-free by all would-be users webwide -- can *all* readers use their
contents or can they not?

> There is a great need
> to ensure that the data in articles have a level of access compatible
> with the author's intentions.

What greater level of access and use than toll-free full-text access
for every would-be user (which includes the capability of reading,
downloading, storing, computationally analyzing, etc.)?

> Otherwise the OA movement might even make
> things worse (by implying that copyright was an unimportant issue).

But here is the hub of the problem: Copyright is an important issue for
those who need or wish to retain or renegotiate rights. But they are *not*
an important issue for authors who are self-archiving their full-texts;
indeed, if their publisher is "green" (as at least 55% of journals already
are) then their journal even officially endorses author self-archiving!

So, on the contrary, it would be to make things worse for OA (still
working hard to get the message about the benefits and feasibility of OA
across to the author community) to wrongly imply that OA can only be
provided for your articles if you can successfully renegotiate rights
with your publisher!

> > Both need to change. Articles need to be self-archived and data need to
> > be self-archived.

Agreed. But the world's main access problem today concerns articles. The
data would be a bonus (a welcome one, but a bonus). And first things
first. Authors already have the habit of making their articles public
(by publishing them). They don't yet have such a habit with their data.
Self-archiving is another new habit. Let us get the habit of
self-archiving the articles (which authors are already in the habit of
publishing) established first, then let's work on the natural next step,
of establishing the new habit of self-archiving the data too.

Or do both in parallel. But don't weigh down article self-archiving with
the further need to do data-archiving (unless the author is gung-ho

> I fully support self-archiving. I enjoyed the presentations and got a lot
> from them. The adoption of green or gold will remove one fundamental
> barrier to access to data, but not all

Agreed, because OA is not about access to data, but about access to
articles (including access to data and articles). But it will lead
naturally to data self-archiving too.

> > Now the solution for self-archiving of one's own articles is to
> > self-archive them, period. Absolutely no need to get or seek
> > re-publication rights or any other change in copyright. The same
> > is true of one's own data. Just self-archive it.
> This won't solve our problem. Indeed it is almost more frustrating. We can
> now see the data but we can't re-use it (safely).

I do not see this at all. If all my articles are self-archived toll-free on
the web, and so are all my data, what are the "unsafe" re-uses?

> > But for the data of *others* (e.g., ISI's citation data), one may *not*
> > access it without paying a toll, and one certainly may not re-publish
> > it.
> This isn't relevant. If ISI has created its own information it is allowed
> to copyright it. I wouldn't dream of republishing it. However I might wish
> to re-use parts of it. ISI might reasonably object. It would come down to
> fair use. BUT I expect that almost all scientists (most outside the OA
> movement) would not wish to legally forbid reuse of their data.

Well then either scientists should not give their data to proprietary
secondary publishers, or they should *also* self-archive them, thereby
catering for all would-be users who cannot afford the secondaries (just
as they did with the primaries). What's the problem?

> > As to my data: If I self-archive it, anyone can read, download,
> > process, analyse it, and report the results.
> Not if the copyright does not grant the right.

I said that I self-archived my own data, toll-free, for all. What does
that have to do with copyright? Besides, who transfers copyright for
their own data? Many do so for their articles, but who does it for their
data? And if they want users to be able to use it, why give it to a
secondary at all, let alone transfer copyright to the secondary?

But data-publishing is all new terrain. Article-publishing is not.

> > To republish my data in a compliation of theirs, they need my
> > permission (which I give them, of course).
> Why?

Because most researchers don't gather data in order to sell usage rights
to them (any more than they write articles to sell access to them).
But it is beside the point. That's up to the researcher.

> If I publish data no-one should need my permission to reuse it.

Agreed. And since most researchers likewise agree, there is no problem.

> online - but not fully accessible

How are full-text articles (including the data in them) that are
accessible toll-free by any user webwide not "fully" accessible?

> My proposal would be:
> (a) that the OA movement could take a constructive look at our concerns
> and not dismiss it summarily

The concerns of those who seek access and full usage of all research
data are not dismissed. But OA does provide access and full usage for all
research data in articles. The rest just pertains to the unpublished
data, which authors should certainly be encouraged to self-archive too,
rather than giving only to secondaries, let alone transferring all
rights to the secondaries!

> (b) that they/we explore enhancements to the OA procedures that help
> the accessibility of data.

OA provision for all articles and hence all data therein would certainly
help with the accessibility of data! Encouraging OA self-archiving of
unpublished data in addition is also being done by the OA movement.

> In the first instance I would be happy with
> something like a statement: "the data in this manuscript are provided
> copyright free and available for re-use without the author's or publishers
> explicit permission, as long as provenance and moral rights are honoured".
> links to ICSU statements could be made. IANAL but this seems a feasible
> start and would make the position clear.

No need whatsoever for this in the case of OA articles. And certainly no
need to renegotiate any of this with the TA publisher of the article --
and a-fortiori not if the publisher is already officially green!

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online (1998-2004)
is available at the American Scientist Open Access Forum:
        To join the Forum:
        Post discussion to:
        Hypermail Archive:

Unified Dual Open-Access-Provision Policy:
    BOAI-2 ("gold"): Publish your article in a suitable open-access
            journal whenever one exists.
    BOAI-1 ("green"): Otherwise, publish your article in a suitable
            toll-access journal and also self-archive it.
Received on Thu Mar 04 2004 - 22:30:49 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:22 GMT