INF11 – Activity Data incubation workshop 1

This blog post is written on behalf of JISC.

There are a number of afternoon session, and I’m attending perhaps the most structured which is around the Activity Data strand. This is kicking off with a number of short talks.

Mendeley

This afternoon session is kicking off with Ian Mulvany from Mendeley talking about the work they have done. Mendeley comes out of the tradition of ‘reference management’ in some ways, but adds ‘activity data’ to it. Users of Mendeley can download a desktop client and also host an online library – either of citations, or citations with papers attached.

Mendeley uses the ‘Last.fm‘ model of activity data ‘scrobbling‘ information about usage from the desktop client – so as users read, use, annotate papers, this information is captured.

This ‘activity data’ can then be used to build recommendations – so based on papers you have used, or have in your library, Mendeley can start to make recommendations on other papers that may be of interest.

Mendeley has an API which developers are encouraged to use to build applications on.

Now Mark Stubbs from MMU talking about lessons from Activity Data Analysis:

Top tips

  • Common ids essential
  • Snapshotting helpful – things change over time (e.g. data may be removed)
  • Visualisations can be very helpful to understand the data
  • Look at both Quantitative and Qualitative aspects of the data – discuss processes underlying patterns

Experience from two projects:

PhD on evaluation MLEs

Combining information from Student Records systen (Agresso/Unit4) and Blackboard WebCT Vista – but had to work around the fact they didn’t use common identifier

Random Forest Algorthim used to do analysis.

Found interesting results – e.g.

  • Correlation between late night usage of VLE and failure to progress
  • Stopping VLE use early might be a sign of dropping out
  • Diminishing reutrns for staf input to VLE – stop fiddling
  • Found that ‘document download’ was more important to progression that participation in chat/discussion

MMU undergoing an Institutional Transformation – re-writing undergraduate curriculum – and using information from activity data to help.

David Kay notes that if you are looking at the Activity Data strand of the call, you aren’t limited to library data – VLE/MLE and other activity data is in scope.

MIMAS and Activity Data

Joy Palmer from MIMAS talking about how MIMAS would like to see activity data used in relation to bibliographic data and services. Joy asking why we are still talking about the potential of activity data in libraries – especially after the TILE and MOSAIC projects and further work at the University of Hudderfield demonstrated the value activity data could add. Identifying some barriers:

  • Technical barriers
  • Getting ‘buy in’ across the library & institution

Some questions:

  • Where are the ‘quick wins’?
  • What don’t we know about exploiting Activity Data?

Need to articulate:

  • user demand
  • benefits
  • value
  • sustainability (Joy notes how often sustainability was mentioned this morning during the briefings)

Making a Business Case (Joy says) is key.

In arts and humanities book usage still very high – and will continue. Even Google accept there will be books that will not be digitised in the forseeable future.

MIMAS conducted some market research and found:

  • Centrifugal searchers [think this meant working out from a central place]
  • Berry-picking from various trails
  • possible Information Literacy issues resulting in dead-ends

Researchers are suspiciuous about User Generated Content – especially ratings and reviews – but could see immediately benefits of ‘tacit’ recommendation systems – and are very used to this type of recommendations from Amazon. Joy uses example of how BookGalaxy – the winning entry for the MOSAIC competition – can surface relationships that wouldn’t come from simple searching.

Joy asks ‘What if’?

  • These patterns represented a national aggregation of activity data
  • Users could search the long talk of data

“In humanities research it’s the long tail all the way” – Joy attributes this quote to Paul Walk.

What can this mean?

  • Surfacing and increasing usage of hidden collections (and demonstrating value)
  • Providing new routes to discovery based on user and disciplinary context (not traditional classfication)
  • Powering ‘centrifugal searcing’ and discovery through serendipity
  • Enabling new, original research – academic excellence

We can make data work harder to solve other problems – e.g. what you can let go from your collections (collections management)

Could this be a ‘virtuous circle’? Can ‘activity data’ be ‘open’ in the same way we might aspire to for bibliographic data?

Publisher and Institutional Repository Usage stats

Paul Needham from Cranfield University presenting on this – PIRUS2 which was a continuation of the original PIRUS project (originally led by COUNTER).

Paul notes the rise in interest in article-level usage – more journal articles hosted by institutional and other respositories, and online usage becoming an accepted meauser of article and journal value – and technical and standards development (e.g. COUNTER) make it possible to track usage at article level.

However COUNTER had focussed on usage stats at the journal title level – so PIRUS2 aimes to develop COUNTER compliant usage data and stats at the individual article level. Also to create guidelines to enabling the sharing or production of standardised usage data and reports.

PIRUS2 developing a model for a real-worlod article-elvel publisher/repository usage statistics service, and to develop a suite of free, open access programmes to support the generation and sharing of COUNTER-compliant usage data and stats.

PIRUS2 has three scenarios for gathering data:

  • ‘tracker’ code – a server-side ‘Google Analytics’ for full-text article downloads
  • OAI-PMH harvesting – as this is supported as standard by major repository software
  • SUSHI – Standardized Usage Statistics Harvesting Initiative Protocol – which publishers already use – however, doesn’t currently support article level stats, although it will do in the future

PIRUS2 has already developed plugins/extensions for DSpace, EPrints and Fedora. Currently gathering data via tracker from 6 repositories.

Paul notes that the techical side is relatively easy, but the ‘political’ side more challenging. This involves getting involvement and agreement from publishers, instituitons, other stakeholders. Clearly there are sensitivities around for example the promotion of instituitonal repositories when working with some publishers.

More information on PIRUS2 at http://www.cranfieldlibrary.cranfield.ac.uk/pirus2/tiki-index.php

Activity data at University of Huddersfield Library

David Pattern and Graham Stone talking about the work carried out at University of Huddersfield using library system activity data to drive a number of services:

  • Recommendations (e.g. people who borrowed this also borrowed)
  • Personalised Recommendations (e.g. what to borrow next based on your loan history)
  • Keyword search cloud – based on what people were searching for – found originally approximately a quarter of searches found zero results – so implemented spellchecker!
  • Guided keyword searches – if someone searches ‘law’ get thousands of results – so highlight the words often combined with law in searches
  • Click stream data – currently collecting this, although not sure the best way to make use of this

University of Huddersfield released the circulation and recommendation data under an open license – Dave says this makes him feel good and he recommends it!

Looking at the impact, found that once borrowing suggestions were added to the catalogue (in 2005), there was a change in the borrowing habits of students – increase in range of stock circulating – quite a marked correlation.

Another correlation is an increase in the average books borrowed per year – increased after the implementation of borrowing suggestions.

Graham Stone now talking about activity data for the School of Human and Health Sciences – found that a reasonably large proportion of students were not using the library – either borrowing books or logging in to online resources. So started to look at final outcomes for students in terms of attainment, and found that the more books a student borrowed, the higher classification of degree they tended to get – a strong correlation.

Graham noting that they haven’t looked at whether certain affects are statistically significant, but using data to help inform thinking. Finding some differences in terms of book vs electronic resource usage across different courses of study.

Now starting to look at the reasons for unexpected non/low use. Looking at

  • Course profiling
  • Targeted promotion
  • Raise tutor awareness

Need to benchmark findings with potential partner, as well as test for statistical significance, and would like to develop a toolkit.

INF11 – Activity Data

This blog post is written on behalf of JISC

This strand looking for projects that explore user activity data to improve services to institutional staff and students – also there will be a single ‘synthesis’ project in this strand. The detailed description of this strand are at http://infrastructurecalloct2010.jiscpress.org/appendix-f-activity-data/ and a briefing paper is available at http://inf11briefingoct2010.jiscpress.org/infrastructure-for-resource-discovery/

All about identifying tools and techniques that can work for the sector. Looking for very practical projects – lookin at how services wil be improve, who will it affect, and how they will be affected. Each project should start with a hypothesis (see http://infrastructurecalloct2010.jiscpress.org/appendix-f-activity-data/?paragraph=27#27 and http://infrastructurecalloct2010.jiscpress.org/appendix-f-activity-data/?paragraph=33#33) and expect projects to look at proving/exploring the hypothesis.

Expect project to release datasets using an open licence wherever possible – but to be clear about any legal or moral problems with this within the bid.

Activity data related to all instituitonal systems is in scope.

There is a related call out at the moment – 12/10 the JISC Business Intelligence Programme – Andy highly recommend that anyone thinking of bidding under the #inf11 Activity data strand should also read this call. Note JISC does not want duplicate bids to both of these calls.