Missing Today: OA Content (85% of it), Not Functionality

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Sun, 23 Oct 2005 13:28:33 +0100

On Sat, 22 Oct 2005, Imre Simon wrote:

> I believe that self-archiving in institutional repositories is a very
> important part of the Open Access movement, but I am afraid that just
> the availability of the papers in these institutional repositories is
> not a solid enough solution.

Not a solid enough solution to what? The problem that OA is intended
to solve is research access-denial and resulting research
impact-loss. Self-archiving immediately solves both problems, but only
85% of articles are still not being self-archived, so the access/impact
problem becomes the problem of inducing the remaining 85% of articles to
be self-archived.

Is Imre Simon saying that his "not solid enough" problem trumps the
access/impact problem? That optimisation schemes should precede 100%
content-provision? And, most important, is he saying providing a "solid
enough" solution should take precedence even if it is in fact at odds
with inducing authors to provide the missing 85% of OA content?

For this is the right context for weighting these questions and assigns
them their proportions and priority -- not the standpoint of abstract
optimisation schemes with no regard for the practical problem of getting
the content in the first place.

> Why? Because what the researcher needs are
> focused disciplinary or thematic digital libraries where a researcher
> can find a lot of papers in the covered theme or discipline.

What the researchers need is the currently missing 85% of content, freely
accessible for all, online. *Then* we can worry about whether they might
be missing anything else over and above that. To talk now about missing
functionality, when 85% of the content on which all functionality is
missing is rather like talking about the need for better table manners
when 85% of the population is still starving.

> The more papers in the covered area he finds there the better it is.

Before a paper can be found free online, it must be made free online.
The reason 85% of papers are not findable is because they have not been
made OA, not because of suboptimal finding tools.

> In a one
> stop search he can find the paper he is looking for, instead of having
> to go to dozens of institutional repositories, each one with his own
> user interface.

Imre, please look at the OAI interoperability protocol, and the OAI IR
harvesters, such as OAIster, or even scirus or google scholar. You are
breaking down open doors.

> Even more important and useful would be if the full text of the papers
> could be digested (indexed) by computer programs and one could navigate
> in the disciplinary library through search engines using the full text
> of the papers making use of the text of all other papers as well to
> determine the ranking of a given paper (number of citations, for
> instance).

That problem will find its place in the priority queue once we have 100%
OA. At the present 15% level of OA that problem is a joke, compared
to the access-denial to 85% of the content for those who cannot afford
access to the journal version.

> Other navigations could be made available: through forward or
> backward references, through hubs and authorities, through text
> similarity or through cited bibliography similarity. A living example,
> with over 700.000 papers with full text in Computer Science is CiteSeer,
> <http://citeseer.ist.psu.edu/> a very useful digital library, a true
> research outlet in Computer Science.

See also Citebase.

> The one condition that makes
> CiteSeer less powerful is the fact that it is still far from complete.

Correct. And providing that missing 85% content is right now the 1st, 2nd
and Nth priority. All else depends on it.

> Theoretically, at least, these disciplinary digital libraries could be
> realized through the OAI protocols, each of them would be a "service
> provider" in the OAI jargon. That is to say, the service provider would
> harvest the papers in the institutional repositories, copy the full text
> of the papers, index them conveniently and make its services available
> to its users.

Not just theoretically, but in actual practice. What is missing is not
functionality, but content.

> Given this scenario, I would like to pose two questions to specialists
> in copyright law, which I am most certainly not.
> Considering the existing permissions to self-archive, given by green
> publishers, do they allow the electronic copy (by a robot) of the full
> text of the self-archived papers, so that they can be indexed by an
> interested service provider and allow him to deliver the services of the
> type described? I think that they probably do not allow for this, but
> would like to hear a more informed opinion.

Why is this question even being asked now, when the tiny 15% of already
self-archived content *is* being harvested by all these indexers, whereas
the remaining 85% is not even being provided? Why talk about copyright
when the problem is missing content?

> The second question is this: assuming that the author would have
> retained the right to distribute his paper under a Creative Commons
> Attribution-NonCommercial license (or even freer than that), would that
> license allow the copy and the operations dewscribed in the paragraph
> above? I think that with a CC license this operation would be perfectly
> legal, even by a robot, but again, I would like to hear a more informed
> opinion.

As I have suggested repeatedly, self-archiving just requires
self-archiving, not the CC license, which could be in conflict even
with a green publisher's copyright agreement, and hence in conflict with
the author's inclination to self-archive (at a time when 85% of authors
don't yet self-archive).

If I sound a little shrill, it is because we have already needlessly
lost at least 10 years of access and impact because of fretting about or
getting distracted by irrelevancies. It would be good if we could keep
our eyes on the ball just long enough to reach 100% OA. After that,
it can be a free-for-all for the meliorists. Till then, please let's
focus on solving the real, immediate problem, at long last.

> If my reasoning is correct, this would be another definitive and very
> important difference between having or not having a CC license available
> to the author to distribute his paper.

And if my reasoning is correct, this is a completely irrelevant distrction
at this time.

Stevan Harnad
Received on Sun Oct 23 2005 - 18:23:08 BST

