Compose yourself

‘Composed’ is my entry for the UK Discovery #discodev developer competition. Composed helps you link between information about composers of classical music by exploiting the MusicNet Codex.

Specifically currently Composed enables linking from COPAC catalogue records mentioning a composer, to other information and records about the composer.

What is it and how do I install it?
Composed is a Bookmarklet which you can install by dragging this link to your browser bookmark bar: Composed.

How do I use it?
If you haven’t already, drag this Composed bookmarklet to your browser bookmark bar (or otherwise add to your bookmarks). Next find a record on COPAC which mentions a classical composer –  such as this one, and once you are viewing it in your browser, click the bookmarklet.

Assuming it is all working, you should find that the display is enhanced by the addition of some links on the right hand side of the screen. If you’ve used the example given above you should see:

  • An image of the composer (Handel) [UPDATE 5th August 2011: See my comment below about problems with displaying images]
  • Links to:
    • Records in the British Library catalogue
    • More records in COPAC
    • The entry for Handel in the Grove Dictionary of Music (subscribers only)
    • Record in RISM (a music and music literature database)
    • A page about Handel on the BBC
    • A page about Handel in the IMDB
    • The Wikipedia page for Handel

These are based on the information available for Handel from the MusicNet Codex and the BBC.

Example COPAC page enhanced by 'Composed'

If the record you are looking at contains multiple composers, you should see multiple entries in the display. If there are no links available for the record you are viewing, you should see a message to this effect.

How does it work?
The mechanics are pretty simple. The bookmarklet itself is a simple piece of javascript, which calls another script. This script finds the COPAC Record ID from the COPAC record currently in the browser. This ID is passed to some php which uses a database I built specifically for this purpose to match a COPAC record ID to one or more MusicNet Codex URIs. For each MusicNet Codex URI retrieved, the script requests information from the URI, gets back some RDF, from which it extracts the links that are passed back to the javascript, which inserts them into the COPAC display. If the MusicNet RDF contains a link to the BBC, further RDF is grabbed from the BBC and the relevant information is added into the data passed back to the display.

So what were the challenges?

Challenge 1

Manipulating RDF. Although I’ve done quite a bit of work with RDF in one form or another, I’ve never actually written scripts that consume it – so this was new to me. I ended up using php because the Graphite RDF parser, written by Chris Gutteridge at the University of Southampton made it so easy to retrieve the RDF and grab the information I needed – although it took me a little while to get my head around navigating a graph rather than a tree (being pretty used to parsing xml).

So I guess I owe Chris a pint for that one 🙂

Challenge 2

The major challenge was getting a list of COPAC record IDs which mapped to MusicNet Codex URIs. Actually – I wasn’t able to do this and what I have is an approximation – almost certainly you can find examples where the bookmarklet populates the screen with a composer when there is no composer mentioned in the record.

Unfortunately MusicNet is unable to point at a COPAC identifier or URI for a person – like many library catalogues, COPAC identifies items in libraries (or perhaps more accurately, the records that describe these items), but not any of the entities (people, places, etc.) within the record. This means that while MusicNet can point at the a specific URI for the BBC that represents (for example) ‘Handel’, with COPAC all it does is give a URL which triggers a search which should bring back a set of records all of which mention Handel.

There is a whole load of background as to how MusicNet got to this point, and how they build the searches to COPAC – but essentially it is based on the text strings in COPAC that were found by the MusicNet project to refer to the same composer. These text strings are what are used to build the search against COPAC. This is also the explanation as to why you sometimes see multiple links to COPAC/the British Library catalogue in the Composed bookmarklet display – because there are multiple strings that MusicNet found represent the same composer.

What I’ve done to create a rough mapping between MusicNet and COPAC records is to run each search that MusicNet defines for COPAC and grab all the record ids in the resultant record set. This gives a rough and ready mapping, but there are bound to be plenty of errors in there. For example one of the  searches MusicNet holds for the composer Franz Schubert on the British Library catalogue is http://catalogue.bl.uk/F/?func=find-b&request=Schubert&find_code=WNA – which will actually find everything by anyone called ‘Schubert’ – if there are any similar searches in the COPAC data I’ll be grabbing a lot of irrelevant records in my searching. Since the number of searches, and resultant records, is relatively high (e.g. over 30k records mention Mozart), at the time of writing I’m still in the process of populating my mapping – it is currently listing around 50k [Update: 31/7/2011 at 15:33 – final total is 601,286] COPAC IDs, but I’ll add more as my searches run and produce results in the background.

I’m talking to the MusicNet team to see if they are able at this stage to track back to the original COPAC records they used to derive their data, and so we could get an exact mapping of their URIs to lists of record IDs on COPAC – this would be incredibly useful and allow functions such as mine to work much more reliably.

None of this should be seen as a criticism of either the MusicNet or COPAC teams – without these sources I wouldn’t have even been able to get started on this!

Final Thoughts

I hope this shows how data linked across multiple sources can bring together information that would otherwise be separated. There is no reason in theory why the bookmarklet shouldn’t be expanded to take in the other data sources MusicNet knows about – and possibly beyond (as long as there is access to and ID that can finally be brought back to MusicNet).

Libraries desperately need to move beyond ‘the record’ as the way they think about their data – and start adding the identifiers they already have available to their data – this would make this type of task much easier.

If you want to build other functionality on my rough and ready MusicNet to COPAC record mapping, you can pass COPAC IDs to the script:

http://www.meanboyfriend.com/composed/composed.php?copacid=<copac_record_id>

You’ll get back some JSON containing information about one or more composers with a Name, Links, and an Image if the BBC have a record of an image in their data.

Discovering Discovery

As I mentioned in a recent post I’ve been involved in UK Discovery (http://discovery.ac.uk) – an initiative to enable resource discovery through the publication and aggregation of metadata according to simple, open, principles.

Discovery is currently running a Developer competition. Others have already blogged the competition, but what I wanted to do here was note the reasons for running the competition, capture some ideas that I’ve had, and hopefully inspire others to enter the competition (as I hope to myself).

Firstly – why the developer competition? For me I hope we can achieve three things through the competition:

  1. Engage developers in/get them excited about Discovery
  2. Get feedback from developers on what works for them in terms of building on Discovery
  3. Start building a set of examples of what can be achieved in the Discovery ecosystem

If we achieve any of these I’ll be pretty happy. We are still at early days in building an environment of open (meta)data for libraries, archives and museums, but the 10 data sets we are featuring in the competition provide good examples of the type of data we hope will be published with the encouragement and advice of the Discovery initiative.

On to ideas. The list below is basically just me brainstorming – my hope is that others might be inspired by one of the ideas, or others might contribute more ideas via the comments. (I’ve already picked one of the ideas below that I’m going to try and turn into an entry of my own – but for the purposes of dramatic tension, I won’t reveal this until the end of the post!)

  • Linked Library Catalogue. Rather than having a catalogue made up of MARC (or other format of choice) records, rather simply a list of URIs which point to the bibliographic entities on the web. Build an OPAC on top of this list by crawling the URIs for metadata and indexing locally (e.g. with Solr). Could use Cambridge University Library, Jerome and BNB featured datasets as well as other bibliographic information on the web.
  • What’s hot in research? Use the Mosaic Activity Data, the OpenURL Router data and other relevant data (e.g. from research publication repositories) to look at trends in research areas. Possibly mash up with Museum/Archive data to highlight relevant collections to the research community based on the current ‘hot topics’?
  • Composer Bookmarklet. Use the MusicNet Codex to power a bookmarklet that when installed and used would link from relevant pages/records in COPAC/BL/RISM/Grove/BBC/DbPedia/MusicBrainz to other sources. Focus on providing links from library catalogue records to other relevant sources (like recordings/BBC programmes)
  • Heritage Britain. Map various cultural heritage items/collections onto a map of Britain. Out of the featured datasets English Heritage data is the obvious starting point, but could include data from Archives Hub, National Archives Flickr collection, and the Tyne and Wear Museums data.

Remember that although entries have to use data from one of the featured data sets (I’ve mentioned them all here), you can use whatever other data you like…

If you’ve got ideas (perhaps especially if you aren’t in a position to develop them yourself) that you think would be great demonstrations or just really useful, feel free to blog yourself, or comment here.

And the one I’m hoping to take forward? The Composer Bookmarklet – I’ll blog progress here if/when I make any (although don’t let that stop you if you want to develop one as well!)