This document is intended to guide you through the analyses that the IRS download statistics software provides. It is not a user guide, rather it is a walkthrough of the analyses and intepretation that you may put the software to.
Please note: only the use of full texts is recorded by the statistics package, and the evaluation set show here includes data up until end September 2007. The examples given here may differ from the live service as these analyses were prepared up to a week ago, and have been saved as static PDF or image files for illustration.
The live service is currently available at http://irstats1.citebase.org/perl/stats.cgi and the top ten table that starts this document can be found at this location
IRStats software is intended to provide in depth and interoperable statistics on the usage and downloads of items in repositories. One of its main concerns is to filter out all apparent downloads that are actually caused by web crawlers and to leave only those downloads which can reasonably attributable to individuals reading the repository resources. The software consists of several components - web log database, an OAI interface to share the log data with services that analyse download impacts, and a package that creates reports of the download activities of the repository.
It is the last of these components that this document highlights. It walks through a number of the usage reports, and shows how the information contained in them can be interpreted to give a high level understanding of the functioning of the repository and the usage of its contents.
The first report, included here as a linked table, shows the top ten downloaded items in the last quarter (July-September 2007). The title of each item is linked to the eprint record in the repository. The icon following the bibliographic reference is linked to the stats dashboard for that record, a summary of differnt aspects of the item's use (covered in a later section).
|Carnie, Andrew Magic Forest: a time based slide dissolve work.||869|
|Warburton, W.I. (2005) What are grounded theories made of? In, 2005 University of Southamtpon LASS Faculty Post-graduate Research Conference, Southampton, UK, 6-7 Jun 2005. Southampton, UK, Faculty of Law, Arts and Social Sciences (LASS), 10pp, 1-10.||490|
|(2007) CLIVAR Exchanges - Ocean model development and assessment. Southampton, UK, International CLIVAR Project Office, 28pp. (CLIVAR Exchanges, No. 42 (Vol 12(3))||382|
|Gupta, S., Chimbira, W., Watkins, S., Crawford, H., Marden, B., Legg, J. and Marsh, M.J. (2001) Acute lung injury in paediatric intensive care: Course and outcome. Critical Care, 5, (1) (doi:10.1186/cc1301)||303|
|Garcia Sanjuan, Leonardo and Wheatley, David (2006) Recent investigations of the megalithic landscapes of Seville province, Andalusia: Dolmen de Palacio III. In, Joussaume, R, Laporte, L and Scarre, C (eds.) Origine et Développement du Mégalithisme de L’ouest de L’Europe. Actes du Colloque International, 26-30 Octobre 2002, Bougon, France. Niort, France, Conseil Général des Deux-Sèvres, 473-484.||266|
|Woollard, W.J. (2004) The rôle of metaphor in the teaching of computing; towards a taxonomy of pedagogic content knowledge. University of Southampton, School of Education, PhD Thesis, 167pp.||254|
|Mokhtar, Mohd Ridzuan (2005) Bragg grating filters for optical networks. University of Southampton, Faculty of Engineering and Applied Science, Optoelectronics Research Centre, PhD Thesis, 162pp.||235|
|Jones, Keith and Mooney, Claire (2003) Making space for geometry in primary mathematics. In, Thompson, I. (ed.) Enhancing primary mathematics teaching. Maidenhead, UK, Open University Press, 3-15.||230|
|Coles, S. J., Frey, J., DeRoure, D., Hursthouse, M. and et al, . (2004) The CrystalGrid Collaboratory Foundation Workshop, Southampton, 13-17 September, 2004: a selection of presentations. Southampton, University of Southampton, School of Chemistry||227|
|Yusoff, Zulfadzli (2004) Applications of highly nonlinear holey fibres in optical communications. University of Southampton, Faculty of Engineering and Applied Science, Optoelectronics Research Centre, PhD Thesis, 162pp.||222|
This list contains some insights that are gained from the above table.
Clicking on the dasboard icon in the above table will take you to the stats dashboard for any specific eprint. The dashboard shown here is taken from the first eprint in the "Top 10" table. The obvious features of the download graphs are that there were no downloads prior to January 2007 (when it was deposited), and that apart from a few isolated spikes there were very few downloads until the middle of July when a regular pattern of about 10-15 downloads per day was established.
The Referrer Graph shows that a high percentage of the visits come from external links, and the Top Ten Non Search Referrers list shows that a web page of activities for new students at Staffordshire University links directly into the eprint's contents. This page was created (or at least last modified) on July 17th, and co-incides with the growth of use of this page. It seems likely that the extra link to the eprint has raised the PageRank in Google and triggered more search engine raffic to the page. Unfortunately it is not possible to test this because the google query terms (see Top Ten Search Terms Table) are being truncated by the software in this version. However, it is interesting to note that this item is being used in a teaching context.
The dashboard for the third item (CLIVAR Exchanges) shows that almost all the traffic is being generated by prominent links at the CLIVAR programme home page. That page shows that the eprint which is listed as "Monograph (Project report)" is in fact the official publication of the CLIVAR organisation newsletter, which is produced by NOC. This shows the use of the repository as a quasi-publication mechanism.
Similarly, most traffic for the fifth item (Recent investigation of the megalithic landscapes of Seville province, Andalusia) comes from a Portuguese archaeological academic blog. This highlights the existing use of Web 2.0 technologies.
Most of the last item's traffic seems to come from its inclusion in a discipline based bibliography at the Technical University of Denmark.
The statistics control panel (illustrated on the left) allows you to choose a wide range of reports on download activity, based on all, or various subsets of, the contents of the repository. It is this control panel that creates each of the reports seen above, and the dashboard itself.
The first section controls the set of eprints for which download information is to be analysed. By default it is all the eprints in the repository. Alternatively, the downloads for a particular faculty, school or research group can be examined. (This means all the eprints that have been declared to belong to that particular part of the institution in the metadata.) Alternatively, those eprints that are tagged as being about a particular topic (from the Library of Congress Classification) or written by a particular author. Finally an individual eprint can be examined by specifying its id.
The next section controls the time period over which data is examined: either between two handpicked dates, or for a named period - last year, last quarter or last month.
Finally, the particular view (or analysis) is chosen. The most popular are monthly and daily download histograms and a top 10 table of most downloaded items. All the views are actually HTML fragments that can be blended into portal pages, a repository front page or an eprint abstract page by setting the appropriate CSS styles.
From this page we can see the following details:
|All eprints, last year, monthly downloads graph||Longterm patterns of access||About 20,000 downloads per month|
|All eprints, last quarter, daily downloads graph||Medium term patterns of access||A constant daily loading of about 600-700 items per weekday, with half that amount at the weekend.|
|School of Oeanographic and Earth Sciences, last year, monthly downloads graph||Long term patterns of access for a specific school||About 1200 downloads per month for this single school, but not necessarily representative. To examine the difference between separate schools, see a report generated with each school's monthly deposit graph. Some schools attract 1000-3000 downloads per month (Engineering Sciences, Ocean and Earth Sciences, Optoelectronics Research Centre, Southampton Statistical Sciences Research Institute and the National Oceanography Centre) while others only attract on some dozens of downloads per month(Social Sciences, Psychology, Civil Engineering, Law and the Institute of Sound and Vibration Research).|
|All eprints, last year, top ten academies||Summary of all accesses from various institutions||Demonstrates the institutions where most Southampton readers come from: Leeds, Nottingham, Carnegie-Mellon, Aberdeen, Newcastle, Cambridge, Oxford, Durham, Robert Gordon and Strathclyde.|
|All eprints, last year, top countries||Summary of access from various countries||After Britain and the US, it is China and India who are Southampton's most prolific readers.|
|EPrint id 9007, last year, top ten search terms table||Most common search engine query terms||Each term can show how important this resources is for a particular query. How far down the Google page does eprints.soton appear? In fact, one the date of writing this report, this eprint came 73rd in Google's list of results for traqueostomy. It is possible to see the influence that external links have on the PageRank of the eprint by tracking the position of the eprint in the Google query results over time (this currently has to be done by hand).|
|Eprint 43023, last year, referrer graph||Proportion of referrals from external links vs search engines vs internal repository links|
When someone clicks on a link to get to this eprint's data, that link is known as the referrer. By classifying these referrers, we can see whether web visitors are being sent to this item mainly by Google (almost always true), or by navigting the repository's subject tree or collections lists (hardly ever true) or by following a link from another web site — another university's page, a newspaper site, a technology magazine, a reviewer or a blog (sometimes true). It is this latter case (the existence of an external link) that Google picks up on and that increases the ranking of the eprint, so increasing the number of search visitors and then potentially causing more external links in a potentially 'virtuous circle'.
This eprint (Magic Forest) has 5% of its recorded traffic driven by external links - a relatively high proportion.
The following patterns emerge from looking at the various statistical reports listed above. Further work is required to generalise some of the results seen here and to establish genuine causal connections that are based on the small number of individual cases examined so far.
The repository looks very healthy, serving around 700 full texts per day.Although there are few comparators to be made with oother repositories (yet) this figure should be interpreted more as inter-library loans rather that web page downloads.
It looks as if the longer documents (PhD theses, newsletters, exhibitions) are achieving top billing. This may be because longer documents with more content are more useful and useful in more circumstances, or it may simply be because the broader pattern of keyword usage drives up PageRank and satisfies a greater number of queries.
External linking also seems to play a key role in attracting readership. This may be to do with publicity and PageRank, or the link may be a sign of a committment to use for a particular audience (e.g. student cohort or organisation membership).
It is useful to know that a 'vanilla' research repository is actually used in the context of other agendas: teaching, scholarly publishing and blogging.