IRS Evaluation for University of Southampton

Context and Purpose of this Document

This document is intended to guide you through the analyses that the IRS download statistics software provides. It is not a user guide, rather it is a walkthrough of the analyses and intepretation that you may put the software to.

Please note: only the use of full texts is recorded by the statistics package, and the evaluation set show here includes data up until end September 2007. The examples given here may differ from the live service as these analyses were prepared up to a week ago, and have been saved as static PDF or image files for illustration.

Introduction to IRStats

IRStats software is intended to provide in depth and interoperable statistics on the usage and downloads of items in repositories. One of its main concerns is to filter out all apparent downloads that are actually caused by web crawlers and to leave only those downloads which can reasonably attributable to individuals reading the repository resources. The software consists of several components - web log database, an OAI interface to share the log data with services that analyse download impacts, and a package that creates reports of the download activities of the repository.

It is the last of these components that this document highlights. It walks through a number of the usage reports, and shows how the information contained in them can be interpreted to give a high level understanding of the functioning of the repository and the usage of its contents.

Most Downloaded Items

The first report, included here as a linked table, shows the top ten downloaded items in the last quarter (July-September 2007). The title of each item is linked to the eprint record in the repository. The icon following the bibliographic reference is linked to the stats dashboard for that record, a summary of differnt aspects of the item's use (covered in a later section).

Comments on this table

This list contains some insights that are gained from the above table.

  1. Firstly, the most downloaded item in this period was an artefact from the School of Art. This sends an important advocacy message to a School that has not been seen as the centre of repository activity. It is also a confirmation of the importance of the JISC KULTUR project.
  2. The most active item has been downloaded on average 15 times per working day over the last quarter.
  3. The spread of the "Top 10" item reveals genuine diversity in the repository, with each of the University's faculties represented and two of its three Virtual Centres The single unrepresented grouping (the Statistical Sciences Research Institute) does appear in the Top 10 for other periods (e.g. last quarter 2006).
  4. The table also exhibits a variety of types of research output An obvious omission from the above list are journal articles.

Stats Dashboards

Clicking on the dasboard icon in the above table will take you to the stats dashboard for any specific eprint. The dashboard shown here is taken from the first eprint in the "Top 10" table. The obvious features of the download graphs are that there were no downloads prior to January 2007 (when it was deposited), and that apart from a few isolated spikes there were very few downloads until the middle of July when a regular pattern of about 10-15 downloads per day was established.

The Referrer Graph shows that a high percentage of the visits come from external links, and the Top Ten Non Search Referrers list shows that a web page of activities for new students at Staffordshire University links directly into the eprint's contents. This page was created (or at least last modified) on July 17th, and co-incides with the growth of use of this page. It seems likely that the extra link to the eprint has raised the PageRank in Google and triggered more search engine raffic to the page. Unfortunately it is not possible to test this because the google query terms (see Top Ten Search Terms Table) are being truncated by the software in this version. However, it is interesting to note that this item is being used in a teaching context.

The dashboard for the third item (CLIVAR Exchanges) shows that almost all the traffic is being generated by prominent links at the CLIVAR programme home page. That page shows that the eprint which is listed as "Monograph (Project report)" is in fact the official publication of the CLIVAR organisation newsletter, which is produced by NOC. This shows the use of the repository as a quasi-publication mechanism.

Similarly, most traffic for the fifth item (Recent investigation of the megalithic landscapes of Seville province, Andalusia) comes from a Portuguese archaeological academic blog. This highlights the existing use of Web 2.0 technologies.

Most of the last item's traffic seems to come from its inclusion in a discipline based bibliography at the Technical University of Denmark.

Generating Bespoke Statistical Reports

The statistics control panel (illustrated on the left) allows you to choose a wide range of reports on download activity, based on all, or various subsets of, the contents of the repository. It is this control panel that creates each of the reports seen above, and the dashboard itself.

The first section controls the set of eprints for which download information is to be analysed. By default it is all the eprints in the repository. Alternatively, the downloads for a particular faculty, school or research group can be examined. (This means all the eprints that have been declared to belong to that particular part of the institution in the metadata.) Alternatively, those eprints that are tagged as being about a particular topic (from the Library of Congress Classification) or written by a particular author. Finally an individual eprint can be examined by specifying its id.

The next section controls the time period over which data is examined: either between two handpicked dates, or for a named period - last year, last quarter or last month.

Finally, the particular view (or analysis) is chosen. The most popular are monthly and daily download histograms and a top 10 table of most downloaded items. All the views are actually HTML fragments that can be blended into portal pages, a repository front page or an eprint abstract page by setting the appropriate CSS styles.

From this page we can see the following details:

ViewView NameDescriptionComments
All eprints, last year, monthly downloads graphLongterm patterns of accessAbout 20,000 downloads per month
All eprints, last quarter, daily downloads graphMedium term patterns of accessA constant daily loading of about 600-700 items per weekday, with half that amount at the weekend.
School of Oeanographic and Earth Sciences, last year, monthly downloads graphLong term patterns of access for a specific schoolAbout 1200 downloads per month for this single school, but not necessarily representative. To examine the difference between separate schools, see a report generated with each school's monthly deposit graph. Some schools attract 1000-3000 downloads per month (Engineering Sciences, Ocean and Earth Sciences, Optoelectronics Research Centre, Southampton Statistical Sciences Research Institute and the National Oceanography Centre) while others only attract on some dozens of downloads per month(Social Sciences, Psychology, Civil Engineering, Law and the Institute of Sound and Vibration Research).
DomainNumber of Accesses
All eprints, last year, top ten academiesSummary of all accesses from various institutionsDemonstrates the institutions where most Southampton readers come from: Leeds, Nottingham, Carnegie-Mellon, Aberdeen, Newcastle, Cambridge, Oxford, Durham, Robert Gordon and Strathclyde.
All eprints, last year, top countriesSummary of access from various countriesAfter Britain and the US, it is China and India who are Southampton's most prolific readers.
Search TermsCount
coping after subarachnoid haemorrage3
haemacel and gelofusin3
risk factors for late-onset health care asociated bloodstream infections in patients in neonatal intensive care units3
role of histamine in viceral smooth muscle3
traqueostomy Cannula3
oculocefalic reflex3
EPrint id 9007, last year, top ten search terms tableMost common search engine query termsEach term can show how important this resources is for a particular query. How far down the Google page does eprints.soton appear? In fact, one the date of writing this report, this eprint came 73rd in Google's list of results for traqueostomy. It is possible to see the influence that external links have on the PageRank of the eprint by tracking the position of the eprint in the Google query results over time (this currently has to be done by hand).
Eprint 43023, last year, referrer graphProportion of referrals from external links vs search engines vs internal repository links

When someone clicks on a link to get to this eprint's data, that link is known as the referrer. By classifying these referrers, we can see whether web visitors are being sent to this item mainly by Google (almost always true), or by navigting the repository's subject tree or collections lists (hardly ever true) or by following a link from another web site — another university's page, a newspaper site, a technology magazine, a reviewer or a blog (sometimes true). It is this latter case (the existence of an external link) that Google picks up on and that increases the ranking of the eprint, so increasing the number of search visitors and then potentially causing more external links in a potentially 'virtuous circle'.

This eprint (Magic Forest) has 5% of its recorded traffic driven by external links - a relatively high proportion.

Emerging Issues For Southampton's Repository

The following patterns emerge from looking at the various statistical reports listed above. Further work is required to generalise some of the results seen here and to establish genuine causal connections that are based on the small number of individual cases examined so far.

It is useful to know that a 'vanilla' research repository is actually used in the context of other agendas: teaching, scholarly publishing and blogging.