Would you recommend your recommender?

We are starting to see software and projects emerging that utilise library usage data to make recommendations to library users about things they might find useful. Perhaps the most famous example of this type of service is the Amazon ‘people who bought this also bought’ recommendations.

In libraries we have just had the results of the JISC MOSAIC project announced, which challenged developers to show what they could do with library usage data. This used usage data from Huddersfield, where Dave Pattern has led the way both in exploiting the usage data within the Huddersfield OPAC, and also in making the data available to the wider community.

On the commercial side we now have the bX software from Ex Libris, which takes usage data from SFX installations across the world (SFX is an OpenURL resolver which essentially helps makes links between descriptions of bibliographic items and the full text of the items online). By tracking what fulltext resources a user accesses in a session and looking at behaviour over millions of transactions, this can start to make associations between different fulltext resources (usually journal articles).

I was involved in trialling bX, and I talked to some of the subject librarians about the service and the first question they wanted to know the answer to was “how does it come up with the recommendations”. There is a paper on some of the work that led to the bX product, although a cursory reading doesn’t tell me exactly how the recommendations are made. Honestly I actually hope that there is some reasonably clever mathematical/statistical analysis going on behind the recommendation service that I’m not going to understand. For me the question shouldn’t be “how does it work?” but “does it work?” – that is are the recommendations any good?

So we have a new problem – how do we measure the quality of the recommendations we get from these services?

Perhaps the most obvious approach is to get users to assess the quality of the recommendations. This is the approach that perhaps most libraries would take if assessing a new resource. It’s also an approach that Google take. However, when looking at a recommender service that goes across all subject areas, getting a representative sample of people from across an institution to test the service thoroughly might be difficult.

Another approach is to use a recommendation service and then do a longitudinal study of user behaviour and try to draw conclusions about the success of the service. This is how I’d see Dave Pattern’s work at Huddersfield, which he recently presented on at ILI09. Dave’s analysis is extremely interesting and shows some correlations between the introduction of the recommender service and user behaviour. However, it may not be economic to do this where there is a cost to the recommender service.

The final approach, and one that appeals to me, is that taken by the NetFlix Prize competition. The NetFlix Prize was an attempt by the DVD/Movie lending company NetFlix to improve their recommendation algorithm. They offered a prize of $1million to anyone who could improve on their existing algorithms by a factor of 10% or more. The NetFlix prize actually looked at how people rated (1-5) movies they had watched – based on previous ratings the goal was to predict how individuals might rate other movies. The way the competition was structured was that a data set with ratings was given to contestants, along with a set of ratings where the actual values of the ratings had been removed. The challenge was to find an algorithm that would fill in these missing ratings accurately (or more accurately than the existing algorithm). This is a typical approach when looking at machine based predictions – you have a ‘training set’ of data – which you feed into the algorithms, and the ‘testing set’ which is the real life data against which you compare the machine ‘predictions’.

The datasets are available at the UCI Machine Learning Repository. The Netflix prize was finally won in September 2009 after almost 3 years.

What I find interesting about this approach is that it tests the recommendation algorithm against real data. Perhaps this is an approach we could look at with recommendation services for libraries – to feed in a partial set of data from our own systems and see whether the recommendations we get back match the rest of our data. As we start to see competition in this marketplace, we are going to want to know which services best suit our institutions.

One thought on “Would you recommend your recommender?

  1. Yes – evaluating the quality of recommendations in a DL recommender is a tricky proposition. One question is – how “good” (or should it be “useful”) is your usage data (i.e. who are your users?). The quality of your user’s knowledge ultimately gets reflected in the recommendations that are generated by the algorithm.

    Another question is “how good is the recommendation algorithm”? That’s a much easier question to answer, and there are well known methods for comparing algorithms given a test set (see the criteria for the Netflix competition.)

    A “middle way” is to use citation data as a substitute for usage data. At CISTI we developed an experimental recommender service that uses (primarily) citation data (but also usage data, except that there isn’t much of it) for a small (1.5M article) Biomed collection. We ask the users to evaluate the quality of the recommendations they get as a result. You can try it out here:

    http://lab.cisti-icist.nrc-cnrc.gc.ca/synthese/welcome.jsp

    There’s more about this project here:

    http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/Synthese_Recommender

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.