VuFind Virtual Bootcamp

As part of the Lucero project I’m currently working on at the Open University, I’m looking at lots of library catalogue records. While exploring the first set of data I was playing with (around 25,000 records in MARC format) it struck me that one of the more recent library ‘search’ products might be helpful. These new products (sometimes known as ‘next gen’ (NG) discovery platforms) are being taken up by libraries to replace their (often aging, rarely pretty) ‘OPACs’ (online public access catalogues) which tend to be a web interface onto what is, at heart, a ‘business’ system – one that administers books, users, serials, and other library stuff.

These discovery platforms tend to work by taking an import of data from the library catalogue on a regular basis, and specialise in indexing the data, rather than the many other administrative tasks that the library catalogue hides. Using dedicated software, that isn’t worrying about any other functionality, these new platforms tend to be much faster returning search results, and give a lot of flexibility in how indexes are built on the data.

While many of the available products are commercial pieces of software (or increasingly, services), there are a couple of relatively high profile open source solutions –  VuFind and Blacklight. If you are interested in a comparison of these two systems, keep an eye on the CREDAUL at the University of Sussex (http://credaul.wordpress.com) which is looking at the both.

So I decided I’d try installing VuFind and use that to explore the data. VuFind is PHP based, but also makes use of the SOLR search platform, which runs on Java. It took me a couple of hours or so fiddling to get the whole thing working – but I thought that was pretty good going – by the end of it, I had my 25k records fully indexed, and was ready to use the system to explore the data.

All of this gave me an idea – this is something you can run on a laptop, and is a great way of looking at your library catalogue data – often exposing issues with the data that you can correct in the catalogue if you want to as well. So, I had the idea that at the next Mashed Library event (Mashspa in Bath) we could run a VuFind ‘bootcamp’, helping delegates get VuFind installations up and running.

Being an impatient sort, 29th October was far too long to wait to get started, so then I thought that maybe I could do a ‘virtual’ version of the bootcamp beforehand (and that would also make sure I was prepared on the day!). So, the idea is that I’m going to post weekly blog posts dealing with the installation of VuFind step by step. I’ll focus on Windows, but already have some people who are interested in doing an install on Linux and Mac OS X. Along side these, I’ll run weekly ‘support sessions’ where I’ll be online to try to help work through problems/issues that people are having – the idea is that these will be live sessions – although I don’t know whether that will be via chat, voice or something else.

Anyway, the starting point is this blog post, and this forum on the Mashed Library site. If you are interested in joining in, sign up to http://www.mashedlibrary.com/groups/vufind-virtual-bootcamp/ and follow along – I’m intending to post the first set of instructions within the week, with a support session to follow shortly after.

Finally if you are interested in the various ‘next gen’ discovery interfaces for libraries, I’d recommend having a look at this list of JISC projects http://code.google.com/p/jisclms/w/list that all deal with improving/experimenting with the library discovery interface and experience.

Sir Louie

I’m working with the University of Oxford on a new project called ‘Sir Louie‘ (which has a website and a blog) to integrate Reading Lists with their online learning environment called WebLearn (which is Sakai under the bonnet). This project has some similarities to the JISC funded TELSTAR project I recently finished at the Open University – but with some different angles, approaches and different systems involved.

Sakai already has a ‘resource list’ functionality called the ‘Citation Helper‘ (which came out of the Sakaibrary project I first heard about at IGeLU 2006 – not 2008 as I originally stated  – thanks to Lukas for the correction to the date).

With the Sir Louie Project, the hope is we can further enhance the Citation Helper through some quite ‘loose coupling’ of various systems. In essence we want to enable:

  • The addition of citations to a Citation Helper resource list from the ‘resource discovery’ system run by the library service at Oxford University called SOLO (actually Primo by Ex Libris under the bonnet)
  • The addition of holdings/availability information to resources in a Citation Helper resource list so that students (or staff) can see at a glance what is available (and where)

The first part we are hoping to emulate the existing functionality the Citation Helper has for Google Scholar (described in this blog post). This adds an extra button to search results in Google Scholar to import the reference into a Citation Helper resource list. However, where Google Scholar seems to push the metadata across in a reasonably arbitrary format, instead we want to enable the Citation Helper to translate any citation formatted as an OpenURL – which should mean that the Citation Helper can then import citations from any database/search interface that provides OpenURLs for results.

The second part we are planning to use the Juice framework, which in turn is built on JQuery. The Juice framework is designed to enable additional functionality, generally in library systems, using relatively simple javascript. Juice has two main components:

  • Metadefs
  • Extensions

Metadefs enable Juice to grab relevant pieces of metadata from a webpage. Essentially it is a way of telling Juice where specific pieces of information are stored on a page – a typical example is to define where an ISBN is stored. So we will be creating a new Metadef for the Citation Helper screens. However, rather than simply creating a metadef that just works with Citation Helper, we are intending to create a metadef that understands COinS – a way of inserting an OpenURL into an html ‘span’ tag.

COinS are already used by a variety of systems, including the Zotero reference management software, and the LibX library browser plugin/toolbar – so if we add COinS to the Citation Helper lists (it already supports OpenURL), not only can we use it for our own purposes, but we are also enabling these existing applications to work with Citation Helper.

There has already been some work done in making Juice work with COinS as part of the VuFind metadef, so I’m hoping that it won’t be too much of a stretch to get this working with Citation Helper.

Once COinS has been added to the Citation Helper, and we have the metadef working, we can look at the Juice ‘extension’ we need to build. This will need to use metadata from the Citation Helper page – probably an identifier (or set of identifiers) such as ISBN, DOI, ISSN, etc. – and then query appropriate systems to get holdings/availability data back. Rather than build a query to each relevant system (and deal with any cross-site scripting issues that may arise) we are planning to write an additional piece of software here to mediate these requests.

We hope to use a standard format for holdings/availability data using the DLF-ILS ‘GetAvailability’ specification, and possibly looking at DAIA (Document Availability Information API) developed by Jakob Voss. We know Ex Libris (who provide the software for SOLO, and also the core library management system in use at Oxford) are committed to this approach (see the Ex Librian newsletter from 2009), and the DLF specification is also being used by other JISC funded projects, such as Summon4HN.

We are very interested in feedback on this approach – any issues people can spot in our approach, questions, or suggestions are very welcome – just leave a comment below.

Managing classical music in iTunes

[UPDATE 25/02/2013: iTunes 11 was released towards the end of 2012. It introduces (in the comments Paul points out the column browser function did exist in iTunes before version 11) the ‘column browser’ which allows you to navigate your music collection by multiple ‘facets’. The ‘Column Browser’ can only be used when a list is displayed in ‘List’ (as opposed to ‘Grid’ or ‘Artist’) view. To display the column browser, go to the ‘List’ view for a playlist, then (on a Mac at least) use the ‘View’ menu to ‘Show Column Browser’. Once the Column Browser is active you can decide which fields from the item records display in the column browser.

iTunes 11 Classical Music

The Column Browser seems to be visible in the ‘Classical’ smart playlist by default, with the three columns chosen as ‘Composers’, ‘Artists’ and ‘Albums’. However, you can also display ‘Groupings’ (which were displayed by default for Classical music in iTunes 10 – see note below).

A final thing to note is that there are a couple of options with the Column Browser which are turned on by default and (in my opinion) work better turned off. These are ‘Group Compilations’ (which means when you have a variety of music on a single disc marked as a compilation, you don’t see these broken down by artist) and ‘Use Album Artists’.]

[UPDATE 29/07/2011: I’ve just noticed that in iTunes 10, if you use the ‘Classical’ smart playlist that comes pre-setup on the software, the display is different from all other iTunes screens, with the Composer appearing automatically on the left, followed by a ‘Grouping’ column, and finally the track listing. I need to have a play around to find what works best, but I think it pushes towards using ‘Groupings’ to define the ‘Piece’ – the iTunes release notes (at time of writing at http://www.apple.com/itunes/features/) say “You can also use iTunes Groupings to specify Works” (for library geeks, interesting use of FRBR terminology there)]

A quick post inspired by Chris Keene who recently asked:

On itunes, should classical music ‘Artist’ be the composer or conductor?

Since I had similar questions around entering classical music into iTunes I thought I’d just note down quickly the method I’ve settled on, and why.

iTunes isn’t really well designed (some would say I could stop right there) for handling music metadata beyond the basic stuff you might need for a collection of popular music. The data entry, and the browse interface, tends to focus on:

  • Name (of track)
  • Artist (single field)
  • Album

While this seems to work relatively well for my collections of pop and jazz (although my jazz collection is small and I’m not so bothered about detailed metadata), it doesn’t do so well for my classical collection. I’m not sure this is a problem isolated to classical music, and I suspect it is about specific forms of music as well.

The type of thing I found didn’t work well was an album containing several pieces of music with multiple movements. So, for example, I had a CD of Anne-Sophie Mutter playing Mozart’s 3rd and 5th Violin Concertos, with the Berlin Philharmonic, conducted by Herbert von Karajan. The track listing on the CD looks something like this:

Konzert Für Violine Und Orchester Nr. 3 G-dur KV 216
1. Allegro
2. Adagio
3. Rondeau. Allegro

Konzert Für Violine Und Orchester Nr. 5 A-dur KV 219
4. Allegro Aperto
5. Adagio
6. Rondeau. Tempo di Menuetto

Unlike a typical pop album, this track listing is not a useful in terms of the ‘Album’ (CD). This is where we can immediately see the fact that the physical item (the CD) was an artificial way of bundling two pieces of music together. So the first thing I do is to treat the CD as being comprised of two albums – one of violin concerto no. 3 and one of violin concerto no. 5. Once I’ve separated the music from the physical constraints of the CD, there seems little benefit to treating this as a single ‘album’. This is not always the completely the case – ‘The Kreisler Album’ on which Joshua Bell plays a variety of pieces by Fritz Kreisler is probably still worth treating as an album – I tend to make these decisions on a piece by piece basis.

Back to Mozart and Anne-Sophie Mutter:

Having now got an iTunes Album of Mozart Violin Concerto No. 3, with the tracks:

  1. Allegro
  2. Adagio
  3. Rondeau. Allegro

I could enter the details with “Violin Concerto No. 3” as the Album title, and the movements as the track titles. I use the ‘Track Number’ to make sure the movements are going to play in the right order if I play the ‘album’.

I can add Mozart as the ‘Composer’, and I have to make a decision about ‘artist’ – in this case I’m most interested in the fact the soloist is Anne-Sophie Mutter, so this is what I go for, but equally well I could have entered Karajan or even the Berlin Philharmonic – again these decisions can only be made on an individual basis – iTunes just isn’t up to anything more here 🙁 If I want to, I dump the rest of the ‘artist’ information in the ‘Comments’ field. My general rules would be to use the soloist if there is one, and the conductor for orchestral pieces, and if it is a chamber group I’d probably use their name – but these aren’t hard and fast rules – it’s a personal collection, not a library catalogue 🙂

However, even with all this information entered, I still found some irritations when using iTunes to browser my music collection. Although I could add the Composer to the browse interface, it is empty for almost all my non-classical music (and a pain to have an empty column taking up screen estate much of the time), and when sorting by Album, all the Symphonies/Concertos etc. bunch together (as the ‘Album’ is just called “Symphony No. 1”), so I end up having to flick between two columns to know what the piece actually is (i.e. to see the composer is Beethoven etc.)

So, I decided to add the composer information (abbreviated) into the Album title. So now rather than just “Violin Concerto No. 3”, I enter “Mozart: Violin Concerto No. 3”

This is what the entry for the 1st movement of the violin concerto looks like:

And this is how it looks in the iTunes browsing interface:

I have played around with adding numbers in at the start of the movement names (and in some cases the movements of a piece are explicitly numbered anyway), and overall it is far from perfect, but it works pretty well for me.