Re: New Ranking of Central and Institutional Repositories

From: Isidro F. Aguillo <isidro_at_cindoc.csic.es>
Date: Tue, 12 Feb 2008 15:09:59 +0100

Dear all:

Thank you for the useful input. I agree with Steve about the importance of
backlinks, specially if we are able to convince people of doing "deep
linking". We are going to apply some of the advices in a new edition (a
second beta will be ready before the schedukled date of July), but we need
advice about methodological issues: For example it is not possible to
extract information filteres by domain from OAIster and the same applies to
other services. For visits or downloads there is no single source (alexa is,
of course, discarded).

Best regards,

Steve Hitchcock escribió:
> I agree with most of Arthur's points, especially with regard to activity
> and download measures, but I'm puzzled by his comments about link-based
> visibility. He may be criticising the method of calculation or its use in
> the overall factoring, but in principle links seem a relevant measure for
> repositories and one that should be factored in.
>
> At 23:13 11/02/2008, Arthur Sale wrote:
> > Isidro
> >
> > As one of those that contributed to that discussion, may I be more
> > specific?
> >
> > The impact of a repository should be measured by things other than some
> > of the measures that you use. PageRank and Size are both very weak
> > indicators. I give examples below.
> >
> > VISIBILITY
> > Visibility in the way you measure is nothing to do with the purpose of
> > repositories, and only a minor factor in their impact. Let me give
> > examples:
>
> The emphasis here is curious. Repositories don't seek links? If links-in
> are a minor factor now, perhaps not in the future. Links will grow with
> more and better content, but this effect won't be uniform of course. It
> will tell us something about repositories.
>
> > * Inward links to the repository itself are relatively rare, and
> > probably negligible in the total. Almost no-one really goes to a
> > repository to search its content except locally - its value is in
> > federation. The exceptions are (1) central repositories such as CERN,
> > RepEc, ArXiv, etc, and (2) exemplar repositories such as Southampton and
> > QUT. The component is hugely biased towards these repositories.
> If the measure highlights exemplary repositories, isn't that what it's
> meant to do, so long as the measure is not predicated on these
> repositories.? It reveals repositories demonstrating the effect being
> sought.
>
> > * The majority of links to institutional repositories on the Web are
> > probably from depositor's home pages, linking to their research records.
> > In UTas we will gain 600-1000 such links once it is in the standard
> > staff member template. Is this visibility? Or does it measure university
> > size?
> This effect could be eliminated.
>
> > * In a few cases, viewers may link to a paper. However to do this
> > they have to value the paper significantly, then copy the URL, and then
> > post it to a public website or blog. I expect this is a minority in the
> > total of links. Any data otherwise? In any case it is dependent on an
> > author's importance in the field, not the repository value.
> I guess some papers on Webmetrics could tell us something about this
> distinction between what have been called formal and informal links, e.g.
> I came across this recently:
>
> Kousha, K. and Thelwall, M. (2007) The Web impact of open access social
> science research
> http://dx.doi.org/10.1016/j.lisr.2007.05.003
> preprint
> http://www.scit.wlv.ac.uk/~cm1993/papers/OpenAccessSocialSciencePreprint.d
> oc
>
> Blogs are a growing element of scholarly discourse and are a valid effect.
> If these links are not pointing towards repositories then it's the content
> problem again. and the content isn't always finding its way into IRs even
> when it is OA, e.g. above.
>
> Link-based visibility should be a factor in evaluating repositories.
>
> Steve Hitchcock
> IAM Group, School of Electronics and Computer Science
> University of Southampton, SO17 1BJ, UK
> Email: sh94r_at_ecs.soton.ac.uk
>
> > REAL VISIBILITY
> > Real visibility in the case of a repository consists in (a) whether it
> > provides a compliant OAI-PMH interface, and (b) whether that interface
> > is harvested by federated services, such as ROAR, OAIster, etc. One
> > might also add whether the repository is actively harvested as a flat
> > file or via OAI by Google and Google Scholar, Scopus, or Thomson.
> > Noithing else really matters in respect of visibility. All these are
> > measurable. PageRank is irrelevant, sorry.
> >
> > SIZE
> > Size is a terrible measure. Australia is full of examples where the
> > repository has been populated by uploading zillions of old stub records
> > going back to the 1930s or before. The full text is mostly missing,
> > though sometimes a grant has funded image scanning of the document. This
> > is fullness for the sake of fullness. To give one example in your list,
> > the Australasian Digital Thesis Program has 110,000 records of this type
> > of old PhD theses. The full-text simply says: contact the university for
> > a photocopy. That's OK, but the weighting of size ought to be low - less
> > than 20%.
> >
> > If it is necessary to measure size, and it probably is, then I suggest a
> > measure that counts the number of records with a publication date within
> > the last five years. Choose 10 years if you want, but ancient
> > record-keeping does not translate into impact.
> >
> > ACTIVITY
> > It is quite clear from ROAR that deposit activity is a major measure of
> > impact. There are three easy measures to derive.
> > * The number of acquisitions in the last 12 months. Easily discovered
> > from the OAI interface.
> > * The number of acquisitions with a publication date in the last 12
> > months. Easily discovered from the OAI interface. This measures currency
> > as well as activity.
> > * Some repositories are sporadic, some are continuous, the latter
> > reflecting a deep-seated integration within the university's activity. A
> > simple measure would be to derive a statistic from the traffic (see
> > ROAR), such as
> > * number of days in last 12 months with a deposit event
> > * the Fourier spectrum of the last 12 months deposit events
> > having no component with a period longer than 7 days above 10% (I guess
> > at what is significant and perhaps this can be turned into a score).
> >
> > RICH TEXT
> > This is a reasonable measure, though subject to error. For example we
> > sometimes put a full-text that gives instructions on how to ask for
> > access to the item concerned, or a bio of the creator of an artwork.
> >
> >
> > DOWNLOADS
> > I'd love to promote downloads as a measure of impact, but there is as
> > yet no federated way to access this data.
> >
> > I'm happy to continue this dialogue.
> >
> > Arthur Sale
> > Professor of Computer Science
> > University of Tasmania
> >
> > > -----Original Message-----
> > > From: American Scientist Open Access Forum
> > >
> > [<mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAX>mailto:A
> > MERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAX
> > > I.ORG] On Behalf Of Isidro F. Aguillo
> > > Sent: Monday, 11 February 2008 6:53 PM
> > > To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
> > > Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] New
> > > Ranking of Central and Institutional Repositories
> > >
> > > Dear all:
> > >
> > > Thanks for your interest in the Ranking of repositories, part
> > > of our larger effort for rnaking webpresence of universities
> > > and research centers. A few comments to your messages:
> > >
> > > - Currently the Ranking of repositories is a beta version. We
> > > will thank comments, suggestions and criticisms. Information
> > > about missed repositories are warmly welcomed. After feedback
> > > recieved during the last days we are considering a new
> > > edition before the scheduled one in July.
> > > - Our rank formula mimic in part PageRank but our
> > > "inspiration" was in fact impact factor. We maintain a ratio
> > > 1:1 between visibility (impact) and size (activity) that it
> > > is the basis of IF. In order to take into account the
> > > diversity of web info we decide to split the size
> > > contribution according to additional criteria.
> > > - Freshness is a topic we are concerned about not only for
> > > repositories but for the rest of the rankings too. We are
> > > considering to take it into account in the Scholar
> > > contribution giving more weight to recent publications.
> > > - There are methodological problems for producing relative
> > > indicators:
> > > percentage of global output, or institution size
> > > normalization. But you know ranking are usually build by GDP
> > > (US, Japan, Germany,...) and not GDP per capita (Luxembourg,
> > > United Arab Emirates, ...)
> > > - Our position as a research group has been previously stated
> > > but I am going to summarise again: The rankings are made with
> > > the aim of increase the volume of academic information
> > > available on the Web, promoting the electronic publication of
> > > all the activities of the universities, not only the research
> > > related ones. And specially from developing countries institutions.
> > >
> > > Best regards,
> > >
> > > Leslie Carr escribió:
> > > >
> > > > On 9 Feb 2008, at 21:36, Arthur Sale wrote:
> > > >
> > > >> It looks as though the algorithm is the same as for
> > > university websites.
> > > >>
> > > >> Rank each repository for inward bound hyperlinks (VISIBILITY) Rank
> > > >> every repository for number of pages (SIZE) Rank every
> > > repository for
> > > >> number of 'interesting' documents eg .doc.
> > > >> .pdf (RICH FILES)
> > > >> Rank every repository for number of records returned by a Google
> > > >> Scholar search (GOOGLE SCHOLAR) Compute (VISIBILITY x 50%)
> > > + (SIZE x
> > > >> 20%) + (RICH FILES x 15%) + (GOOGLE SCHOLAR x 15%) And
> > > then rank the
> > > >> repositories on this score.
> > > >>
> > > >> This is a poor measure in general. VISIBILITY (accounts for 50% of
> > > >> score!) is not necessarily useful for repositories, when
> > > harvesting
> > > >> in more important than hyperlinks. It will be strongly
> > > influenced by
> > > >> staff members linking their publications off a repository search.
> > > >> Both SIZE and RICH FILES measure absolute size and say
> > > nothing about
> > > >> currency or activity. Some of the higher placed Australian
> > > >> universities have simply had old stuff dumped in them, and are
> > > >> relatively inactive in acquiring current material.
> > > Activity should be
> > > >> a major factor in metrics for repositories, and this could easily
> > > >> measured by a search limited to a year (eg 2007), or by
> > > the way ROAR
> > > >> does it through OAI-PMH harvesting.
> > > >>
> > > > I believe that the Webometrics (ghastly name!) ranking of
> > > repositories
> > > > uses the same criteria as its ranking of universities ie it is
> > > > attempting to quantify the impact that the repository has
> > > had. This is
> > > > very different to the size, deposit activity, or even
> > > used-ness of the
> > > > repository and explains why the major contributing factor is
> > > > VISIBILITY. The main issue for this league table is "how
> > > much evidence
> > > > is there in the public web that your active research and scholarly
> > > > outputs are valued enough by your community of peers that they are
> > > > linking to them".
> > > >
> > > > This will probably seem entirely arbitrary to some people, and
> > > > entirely obvious to others, depending on how much they see "the web"
> > > > as a para-literature. It mimics Google's PageRank valuation of web
> > > > pages according to how many 'votes' (links/quasi-citations)
> > > they get
> > > > from other pages from independent sources.
> > > >
> > > > It is not possible to tell with any accuracy whether a University
> > > > Website is "a good website" simply by looking at the University's
> > > > place in the Webometrics Ranking of Universities. The website is
> > > > simply a channel which delivers visibility-impact for the
> > > University
> > > > (or not). Similarly for the repository.
> > > > --
> > > > Les Carr
> > > >
> > >
> > > --
> > > ****************************
> > > Isidro F. Aguillo
> > > Laboratorio de Cibermetría
> > > Cybermetrics Lab
> > > CCHS - CSIC
> > > Joaquin Costa, 22
> > > 28002 Madrid. Spain
> > >
> > > isidro _at_ cindoc.csic.es
> > > +34-91-5635482 ext 313
> > > ****************************
> > >
>

-- 
****************************
Isidro F. Aguillo
Laboratorio de Cibermetría
Cybermetrics Lab
CCHS - CSIC
Joaquin Costa, 22
28002 Madrid. Spain
isidro _at_ cindoc.csic.es
+34-91-5635482 ext 313
****************************
Received on Tue Feb 12 2008 - 14:23:19 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:13 GMT