Compose yourself

‘Composed’ is my entry for the UK Discovery #discodev developer competition. Composed helps you link between information about composers of classical music by exploiting the MusicNet Codex.

Specifically currently Composed enables linking from COPAC catalogue records mentioning a composer, to other information and records about the composer.

What is it and how do I install it?
Composed is a Bookmarklet which you can install by dragging this link to your browser bookmark bar: Composed.

How do I use it?
If you haven’t already, drag this Composed bookmarklet to your browser bookmark bar (or otherwise add to your bookmarks). Next find a record on COPAC which mentions a classical composer –  such as this one, and once you are viewing it in your browser, click the bookmarklet.

Assuming it is all working, you should find that the display is enhanced by the addition of some links on the right hand side of the screen. If you’ve used the example given above you should see:

  • An image of the composer (Handel) [UPDATE 5th August 2011: See my comment below about problems with displaying images]
  • Links to:
    • Records in the British Library catalogue
    • More records in COPAC
    • The entry for Handel in the Grove Dictionary of Music (subscribers only)
    • Record in RISM (a music and music literature database)
    • A page about Handel on the BBC
    • A page about Handel in the IMDB
    • The Wikipedia page for Handel

These are based on the information available for Handel from the MusicNet Codex and the BBC.

Example COPAC page enhanced by 'Composed'

If the record you are looking at contains multiple composers, you should see multiple entries in the display. If there are no links available for the record you are viewing, you should see a message to this effect.

How does it work?
The mechanics are pretty simple. The bookmarklet itself is a simple piece of javascript, which calls another script. This script finds the COPAC Record ID from the COPAC record currently in the browser. This ID is passed to some php which uses a database I built specifically for this purpose to match a COPAC record ID to one or more MusicNet Codex URIs. For each MusicNet Codex URI retrieved, the script requests information from the URI, gets back some RDF, from which it extracts the links that are passed back to the javascript, which inserts them into the COPAC display. If the MusicNet RDF contains a link to the BBC, further RDF is grabbed from the BBC and the relevant information is added into the data passed back to the display.

So what were the challenges?

Challenge 1

Manipulating RDF. Although I’ve done quite a bit of work with RDF in one form or another, I’ve never actually written scripts that consume it – so this was new to me. I ended up using php because the Graphite RDF parser, written by Chris Gutteridge at the University of Southampton made it so easy to retrieve the RDF and grab the information I needed – although it took me a little while to get my head around navigating a graph rather than a tree (being pretty used to parsing xml).

So I guess I owe Chris a pint for that one 🙂

Challenge 2

The major challenge was getting a list of COPAC record IDs which mapped to MusicNet Codex URIs. Actually – I wasn’t able to do this and what I have is an approximation – almost certainly you can find examples where the bookmarklet populates the screen with a composer when there is no composer mentioned in the record.

Unfortunately MusicNet is unable to point at a COPAC identifier or URI for a person – like many library catalogues, COPAC identifies items in libraries (or perhaps more accurately, the records that describe these items), but not any of the entities (people, places, etc.) within the record. This means that while MusicNet can point at the a specific URI for the BBC that represents (for example) ‘Handel’, with COPAC all it does is give a URL which triggers a search which should bring back a set of records all of which mention Handel.

There is a whole load of background as to how MusicNet got to this point, and how they build the searches to COPAC – but essentially it is based on the text strings in COPAC that were found by the MusicNet project to refer to the same composer. These text strings are what are used to build the search against COPAC. This is also the explanation as to why you sometimes see multiple links to COPAC/the British Library catalogue in the Composed bookmarklet display – because there are multiple strings that MusicNet found represent the same composer.

What I’ve done to create a rough mapping between MusicNet and COPAC records is to run each search that MusicNet defines for COPAC and grab all the record ids in the resultant record set. This gives a rough and ready mapping, but there are bound to be plenty of errors in there. For example one of the  searches MusicNet holds for the composer Franz Schubert on the British Library catalogue is http://catalogue.bl.uk/F/?func=find-b&request=Schubert&find_code=WNA – which will actually find everything by anyone called ‘Schubert’ – if there are any similar searches in the COPAC data I’ll be grabbing a lot of irrelevant records in my searching. Since the number of searches, and resultant records, is relatively high (e.g. over 30k records mention Mozart), at the time of writing I’m still in the process of populating my mapping – it is currently listing around 50k [Update: 31/7/2011 at 15:33 – final total is 601,286] COPAC IDs, but I’ll add more as my searches run and produce results in the background.

I’m talking to the MusicNet team to see if they are able at this stage to track back to the original COPAC records they used to derive their data, and so we could get an exact mapping of their URIs to lists of record IDs on COPAC – this would be incredibly useful and allow functions such as mine to work much more reliably.

None of this should be seen as a criticism of either the MusicNet or COPAC teams – without these sources I wouldn’t have even been able to get started on this!

Final Thoughts

I hope this shows how data linked across multiple sources can bring together information that would otherwise be separated. There is no reason in theory why the bookmarklet shouldn’t be expanded to take in the other data sources MusicNet knows about – and possibly beyond (as long as there is access to and ID that can finally be brought back to MusicNet).

Libraries desperately need to move beyond ‘the record’ as the way they think about their data – and start adding the identifiers they already have available to their data – this would make this type of task much easier.

If you want to build other functionality on my rough and ready MusicNet to COPAC record mapping, you can pass COPAC IDs to the script:

http://www.meanboyfriend.com/composed/composed.php?copacid=<copac_record_id>

You’ll get back some JSON containing information about one or more composers with a Name, Links, and an Image if the BBC have a record of an image in their data.

11 thoughts on “Compose yourself

  1. Yesterday on some Google Plus discussion around this bookmarklet Lihua Zhu (http://liwazi.blogspot.com/ or @liwazi on twitter) notified me that she wasn’t seeing images when she used the bookmarklet.

    This morning I tested and found that I had the same problem. Since I knew this definitely worked when I first blogged here, and I hadn’t changed anything I guessed something else must have changed.

    It turns out that the BBC, the source I was using for the images, had relaunched their Music site, and in the process removed (at least for the moment) their RDF provision (although they are offering xml/json, though apparently not including the image data).

    This is, of course, a continual problem with using distributed data from 3rd parties and lightweight mashups of this type are particularly susceptible. I have to admit I’d thought I was on a pretty safe bet with the BBC, and no doubt the RDF will be restored eventually. I’m still thinking about the lessons to learn here.

    However, as well as the problems, there are some opportunities here as well. Although I was grabbing the image URL from the BBC RDF, I think MusicNet actually provides the image data directly and it looks like they’ve cached it, so I could start using it from their feed instead. Also the new BBC data (xml and json) offers (I think) richer data than previously so I could grab links to reviews on latest releases and include this in the bookmarklet display if I wanted.

    So, frustrating, although not all bad. If anyone from the BBC is reading this I’m a bit annoyed – even minimal RDF would have been better than nothing at all – but apart from that, please get RDF back asap pretty please!

  2. This is pretty cool – great to see a real-life use of the MusicNet Codex! (pics are working again now – did you change the source as you pondered above?)

    Just to clarify your original post, RISM is a database of music sources, not literature. It’s a horribly complicated series of reference works, but the version used in your hack is specifically “Music Manuscripts after 1600 in British and Irish Libraries”. http://opac.rism.info/index.php?id=2&L=1 is the international version of the same series (also free). RILM is the main music literature database but isn’t freely available (Ebsco and Proquest provide it). Linking in to that would be very nice…

    I got some good results from your tool – occasional quirks, e.g. a weirdly specific Beethoven Missa Solemnis Copac search URL with no results, which are probably down to the source data! Oddly nothing came up for Erik Satie or Thomas Ades, both quite well known. Also nothing comes up for non-composers (e.g. Simon Rattle), but as the tool is called Composed I knew I was chancing my luck there 😉

    What would be wonderful would be if LoC Name Authorities were tied in to MusicNet Codex, as maybe then any libraries using standardised names could put together similar hacks? (or maybe that’s not needed – I’m very shaky on linked data despite going to a music linked data day a few months ago…)

    Well done 🙂

  3. Hi Edith – apologies for long delay in replying. Thanks for the points about RISM (and RILM).

    The ‘weirdly specific’ searches are a result of the source data – MusicNet came out of a previous project (MusicSpace) and was an attempt to reconcile composer identities from a range of sources – including COPAC. I’m not sure how the original data was given to MusicSpace, but it looks like sometimes various data from the COPAC record was concatenated (if I had to guess, I’d say that in this case this comes from a 100 $$t but I’m just guessing) – this one isn’t the only example.

    The lack of results for Erik Satie or Thomas Ades is also a result of the source data – Ades doesn’t seem to be in the Codex, and Satie is in there with no links to COPAC http://musicnet.mspace.fm/person/fe4a21e6181d3de18cc949f44bd0025c.html

    Links to the LoC Name Authorities (now available on http://id.loc.gov) or VIAF would be a real plus but what would be really great if libraries started using these identifiers in their own data as well so that we weren’t relying on matching text strings which immediately introduces a lot of vagaries into the system.

    However, it might be a fair bet that at least one of the ‘referred to by’ links to either COPAC or the BL from MusicNet will actually be the LoC Name Authority (or at least, that’s what you’d expect) – so you could just grab the search strings and use them to build a similar tool for any catalogue with some success – although non-latin characters might cause problems as the way character encoding is handled by different library systems tends to vary.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.