Re: Central versus institutional self-archiving

From: Stevan Harnad <>
Date: Tue, 29 Mar 2005 08:15:26 +0100 (BST)

On Tue, 29 Mar 2005, Subbiah Arunachalam wrote:

> Friends, especially friends in India:
> Here is a very useful exchange. Can we in India think of a centralised
> archive similar to the one run by CCSD in France for all research councils
> and departments of the Central Government (CSIR, ICAR, DAE, Dept of Space,
> ICMR, etc.)? Will it be better than each individual laboratory having its
> own archive in the long run? I welcome your views.

Arun, I'm afraid you may have misunderstood my message. The point
was that unless a country already has a national, centralised research
mega-institution distributed all over the country, as France does (CNRS),
a national central archive is not a very practical proposition.

In most countries, research institutions are independent local entities
(mainly Universities or Labs). National research councils may fund their
research, to be sure, but the entity that *provides* the research is the
local institution, and it is that local institution that has the direct
stake in maximising the usage and impact of its own research output by
maximising its visibility and accessibility.

The best way research councils and government departments can help is by
mandating that all funded research must be self-archived (and providing the
funds to do so, if/when they are needed); further providing a central
OAI-compliant archive (for those researchers whose local institutions
cannot for some reason provide a local institutional one of their own)
would be useful too. But the lion's share of the initiative for providing,
monitoring and maintaining open access to their own local research output
must come from the local research-provider institutions. OA is rather like
the Internet itself in that respect.

Please see the study of Swan et al., which analyses this matter in
some depth:

          Swan, Alma and Needham, Paul and Probets, Steve and Muir,
          Adrienne and O'Brien, Ann and Oppenheim, Charles and Hardy,
          Rachel and Rowland, Fytton (2005) Delivery, Management and
          Access Model for E-prints and Open Access Journals within
          Further and Higher Education. JISC Report.

          Swan, Alma and Needham, Paul and Probets, Steve and Muir,
          Adrienne and Oppenheim, Charles and O'Brien, Ann and Hardy,
          Rachel and Rowland, Fytton and Brown, Sheridan (2005) Developing
          a model for e-prints and open access journal content in UK
          further and higher education. Learned Publishing.

Stevan Harnad

> Arun
> Subject: Re: [SI] Ann Okerson on institutional archives
> From: "Stevan Harnad" <>
> Date: Mon, March 28, 2005 8:14 pm
> To:
> Cc: "Leslie Carr" <>
> -------------------------------------------------------------------
> I have to point out that the information from Franck Laloe about CNRS's
> HAL is correct and very helpful but risks being extremely misleading about
> the cost of distributed institutional archiving. Here are the pertinent
> points:
> (1) France is unique in having a national research "mega-institution", the
> CNRS. This consists of CNRS researchers in just about all scholarly and
> scientific disciplines (not just those we call "science") distributed all
> over the country, either in independent CNRS unit or in CNRS units that
> are administratively associated with local universities.
> (2) I am not sure what percentage of the researchers and research output
> of France the CNRS comprises, but it is considerable, and if we add in the
> three other CNRS-like national research institutes (INSERM in medicine,
> INRA in biology and INRIA in information/computer science, which are all
> collaborating with CNRS in self-archiving their research output in HAL),
> that covers the great majority of French research output.
> (3) Because of this unified national mega-institution and mega-archive,
> France is in a position to take a huge step forward toward making 100% of
> French research output OA, thereby setting an example for the rest of the
> world. The total cost of this is very low, because of the economies of
> scale that come with having all national research output centralized in
> this way.
> (4) Most important of all, because all four of these institutions are
> indeed institutions, with the status of employer (and, I am not sure about
> this, but I believe also the status of research funder in some cases),
> CNRS, INSERM, INRIA and INRA are in a position to adopt a unified
> self-archiving policy at a national level, and to ensure that the policy
> is implemented in the whole country, by just about all of its researchers,
> for just about all of French research output, all at once.
> (5) Now the misinterpretation of all this:
> (5a) Few if any other countries are in a position to adopt and
> implement a national self-archiving policy like this, distributed
> across all disciplines. Their research output is local to their
> distributed universities and research institutions, and hence
> self-archiving policy must be distributed and local to those
> institutions.
> (5b) The cost of self-archiving *per local institution* (which is what
> I and Les, and others who have actually implemented such local
> archives said it was: about a $2000 server plus a few days one-time
> sysad time for start-up and a few days a year sysad time for
> maintenance) is far, far lower than the cost of national, central
> archiving (which is itself quite low). It may be that the national sum
> of the local costs of the institutional self-archiving across all
> local universities in a country comparable to the size of France will
> be somewhat higher than the price of France's single national archive,
> HAL, but *this national sum is *meaningless* in countries that have no
> such national structure! It is like summing the library book
> acquisition costs for each of the universities in a country and
> comparing them to central costs: There is no national "pocket" out of
> which all those local library acquisition budgets come, just as there
> is no national pocket for the sum of institutionsè computer, network,
> telephone or research travel costs. Such a comparison only makes
> sense for a country with centralized research, like France.
> (5c) HAL, though an excellent and no doubt robust and highly
> functional national research self-archiving system happens to be
> modelled on the properties of the Physics Arxiv. This is all fine, but
> rather arbitrary, in making comparisons with distributed local
> institutional self-archiving: There is no reason whatsoever why local
> institutions need to adopt either the particular properties of Arxiv
> or the strategies of a centralized national archive. The only thing
> that these local university archives need to ensure is that they are
> OAI-interoperable. The rest of the properties of HAL are merely
> specific further choices that have been made (many no doubt based on a
> priori guesses, not concrete experience or empirical study of
> optimality) it the special case of HAL and CCSD.
> (5d) Franck Laloe's guess that OAI-interoperability is not enough (to
> forestall a 'Tower of Babel;) is precisely that -- an a-priori guess.
> It has not been tested; all the a-posteriori evidence to date, from
> actual distributed university archives, is that the guess is simply
> incorrect: that what archives need is not more functionality (whether
> arxiv-like functionality, HAL-like
> functionality, or otherwise) but *more contents*. Archive content is
> the only thing standing between the research world and 100% OA.
> (5e) The only systematic analysis that has been done, comparing the
> merits of central, national self-archiving and distributed
> institutional self-archiving has come out very strongly in favour of
> distributed institutional self-archiving -- followed by central
> *harvesting* and (if desired) metadata enhancement. A primary reason
> given was the existing research culture of independent research
> universities and institutions, which is local, not centralized or
> national: CNRS and France are a prominent exception in this regard
> (and hence not considered in this study). One of the secondary reasons
> was cost.
> So, in summary, the special case of CNRS+, HAL and France is a great asset
> to world OA, accelerating French OA provision substantially, in a way not
> possible in any other country, at a national and central level, and
> setting a splendid example (of systematic self-archiving policy) that will
> encourage the rest of the world's research institutions to self-archive
> too.
> But please, having already lost so much time in reaching 100% OA because
> of so many other misunderstandings, let us not now lose still more time in
> over-focusing on the local particulars of France's centralized research
> institutions, as these cannot be generalized literally to other countries
> lacking such centralized institutions. Even less should we focus on the
> special Arxiv- and other features HAL has elected to incorporate, or,
> indeed, the cost of HAL: The Arxiv features and their extensions are not
> essential (nor even necessarily optimal!) ones, OAI-interoprability is
> enough, and the costs of a national centralized archive have no basis for
> comparison with countries that distribute their research across
> independent universities and research institutions. What is essential is
> more content, *not* more functionality!
> The take-home message from France is that 100% self-archiving is desirable
> and feasible -- but the details (central-institutional vs.
> distributed-institutional, HAL's specific special features, and their
> cost) are, as they say in hexagonese: << des précisions inutiles >>
> (useless details). The principle of
> adopting and implementing institutional self-archiving policies for 100%
> of research output is what the rest of the world should be taking to heart
> from France's splendid example and initiative.
> Best wishes,
> Stevan
> Stevan Harnad
> On Mon, 28 Mar 2005, Franck Laloe wrote:
> > At 18:15 26/03/2005, Leslie Carr wrote:
> >
> > >On 26 Mar 2005, at 15:14, Franck Laloe wrote:
> > >
> > >>We now have a goood experience of this question at CCSD, since we have
> run an archive for the CNRS (a French research institution) for a few
> years. Actually, the cost of running an archive is not much; one
> salary is needed to pay someone to check that the documents which are
> uploaded are OK for the archive; the price of the buyiung and
> manitaining the hardware is comparable or less.
> > >>
> > >>What costs more money, on the other hand, is to write new software. We
> constantly improve ours (it is now significantly different from
> ArXiv, although it remains compatible with it), and we pay three
> engineers for this. I would say that for a whole (medium size)
> country like France, a centralized system for all disciplins would
> cost about 10 salaries; this is of course an extremely small fraction
> of the research budget of the country.
> > >
> > >This is very interesting and important information. Would you be able
> to give an indication of the kinds of changes that you have had to
> build on the base software (I assume from your message that you began
> with arxiv)? With all of these systems, the devil (and the expense) is
> in the details, but the precise details differ from one situation to
> another. It would be a terrific insight to have an Institutional
> Repository costing data-point at the National end of the spectrum!
> > >---
> > >Les Carr
> >
> >
> >
> > Well, maybe I should first say that I was reasoning more in terms of the
> contribution on one country (France for us) to international archives
> (or repositories, I do not know which word is best). Of course, if each
> institution in the country wants to have totally independent archives
> (even if compatible through OAI-PMH for instance), the overall cost
> would be much higher. In my country there are many institutions (we
> have universities, research institutions, what we call "grandes
> écoles", etc..), and the danger to build an expensive Babel tower is
> real. The whole idea of CCSD is to offer a kind of national (or
> international) service to all institutions that want to set up "direct
> scientific communcation" through openarchives; CCSD develops the
> software and maintains it, adapts it when special requirements are
> necessary, and will ensure the long term preservation (technical
> migrations, soft and hard). This is the general idea, with no special
> limit put at the borders of the country: if any scientific institution
> in the world wants to join, they are welcome, assuming a sufficient
> scientific qualité of course.
> >
> > The data base where the articles are stored is a single base, with
> homogeneous metadata. But our technique allows institution to create
> personalized environments, with their own texts, logos, screen layout,
> and even with additional metadata if useful. Everyone can have acess to
> the generaly system (sumbmission and consultation) either through a
> generic interface, or through a personalized interface that is
> institution dependent and selects only the articles belonging to the
> institution. Institutions which want to have a mirror of backup of all
> their data on a computer they own may do so, if for some reason they do
> not trust CCSD for keeping their material.
> >
> > I should add that it was agreed with our american friends who run ArXiv
> that every document that is collected by CCSD and belongs to one of the
> scientific caterogires of ArXiv will automatically be transferred to
> ArXiv. This works pretty well, and ensures more visibility to the
> articles we collect. But we also collect articles in history,
> education, linguistics, etc.. , which do not go to ArXiv for obvious
> reasons.
> >
> > In practice, of course, there is still a long way to go before we
> collect all scientific production of the country. CNRS is the largest
> research institution in France, and roughly speaking half of the
> scientific departements strongly support CCSD by asking their people;
> there is good hope to include more departements soon. We now have an
> agreement with another research institution in France, INRIA, so this
> will expand the impact of the system. Negociations with other
> scientific insitutions are undeway. Just a figure to give an idea: in
> 2004 we have collected 1 500 thesis files, i.e. about 10% of the
> national production. My hope is to be at about 50% in two or three
> years, but this is only an extrapolation for the moment. And our main
> goal is not limited to thesis, it includes all kinds of scientific
> documents (articles, conference proceedings, etc..).
> >
> >
> > No, at last, the answer to your questions! No, we did not start from the
> ArXiv software, and actually were not advised by Paul Ginsparg and
> colleagues to do so when we started in 2000, for good reasons. ArXiv is
> almost 15 years old, techniques have changed since. Our software, which
> we call Hal (as the crazy computer in the movie!) does many things that
> ArXiv does not do: as I said above, it allows a personnalization of
> environments, contains the notion of "stamps", of "collections", can
> extract lists of publications, etc.. It constantly evolves under the
> pressure of various demands, and this is why we need three engineers at
> >
> > This has been a long message, I will stop here! But please do not
> hesitate to ask if you wish to know more. Concerning the cost of CCSD,
> it is easy to calculate: salaries for three engineers (count 4 if you
> count Marco and me, two part time physicists), offices, usual expenses,
> computers and servers (but this is not much, except if you count backup
> procedures which can be expensive if they are at a high level of
> security).
> >
> > best wishes
> >
> >
> > Franck Laloë
> >
> >
> Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005
>> Paris (France)
> Paris (France)
> > tel et fax 33 (1) 47 07 54 13 --
> >
> >
> Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005
>> Paris (France)
> Paris (France)
> > tel et fax 33 (1) 47 07 54 13 --
> >
