Usage Factor Stage 2 – Data Modelling and Analysis

Progress Report                                    


Building upon the encouraging reactions revealed in the market research, Stage 2 of our project is developing a programme of data modelling and analysis that will use real usage data from a number of content providers, with the aim of identifying potential candidate metrics for longer term scaled up testing.


Our original plan was to ask the participating content providers to submit raw usage logs to a third party for analysis and modelling, with instructions to assess and interpret the logs in detail with the goal of identifying one or more viable metrics.


It soon became clear that the size and complexity of raw usage logs is such that to process and analyse statistically significant samples of data would have been beyond our budget. In addition, incorporating raw usage log processing into the project in addition to the core modelling and analysis work would, we decided, unnecessarily restrict the talent available to us to conduct the necessary work.


So it was agreed that we would instead draw up a specification for two reports which we would ask publishers to produce, summarising for each journal:-

  • Number of Successful Full-Text Item Requests by year of original publication
  • Number of separately downloadable Full-Text items by Journal and year of original publication




This we did but, after further discussion, agreed that we needed to modify the reports to compensate for the fact that articles are published at different times in the calendar year. So we agreed that we needed reports which would provide data about the first 12, 24, 36 months of usage of articles published in given calendar years rather than calendar year usage which in the first “year” could be as little as one month and as much as 12.


During the course of working on and agreeing the specification for these revised reports we realised that the issue of the gap between publication dates of early and final versions of same article was much more significant than we originally thought. We discovered that some early or even “final” versions of articles are published online many months (sometimes years) before the official publication date of the journal issue of which they are nominally a part. It was at this point we concluded that we couldn’t in the end ask our data modelling and analysis contractor to do their analyses at the journal level. We would need to supply usage data at the article level, showing usage patterns of different versions (e.g. early and final) of articles separately.


So we developed a specification for an article level report (“Specification for Usage Factor data supply by article Version 1.2” – view Excel spreadsheet) which will form the basis of the data which we will provide to whichever third party we select to perform the analyses necessary to assess what if any metrics could provide a credible measure of relative usage – our Usage Factor.


If you would like to be alerted when the data modelling and analysis RFP is issued please e-mail:

Alison Whitehorn

Business Manager, UKSG
alison@uksg.org