Contemporaneous part two

Following on from my previous post about BNB and SPARQL in this post I’m going to describe briefly building a Chrome browser extension that uses the SPARQL query described in that post – which given a VIAF URI for an author tries to find authors with the same birth year (i.e. contemporaries of the given author).

Why this particular query? I like it because it exposes data created and stored by libraries that wouldn’t normally be easy to query – the ‘birth year’ for people is usually treated as a field for display, but not for querying. The author dates are also interesting in that they give a range for the date a book was actually written rather than published which is the date that is used in most library catalogue searching.

The other reason for choosing this was that it nicely demonstrates how using ‘authoritative’ URIs for things such as people makes the process of bringing together data across multiple sources much easier. Of course whether a URI is ‘authoritative’ is a pretty subjective judgement – based on things like how much trust you have in the issuing body, how widely it is used across multiple sources, how useful it is. In this case I’m treating VIAF URIs as ‘authoritative’ in that I trust them to be around for a while, and they are already integrated into some external web resources – notably Wikipedia.

The plan was to create something that would work in a browser – from a page with a VIAF URI in it (with the main focus being Wikipedia pages), allow the user to find a list of ‘contemporaries’ for the person based on BNB data. I could have done this with a bookmarklet (similar to other projects I’ve done), but a recent conversation with @orangeaurochs on Twitter had put me in mind of writing a browser extension/plugin instead – and especially in this case where a bookmarklet would require the user to already know there was a VIAF URI in the page – it seemed to make sense.

I decided to write a Chrome extension – on a vague notion it probably had a larger installed base of any browser except Internet Explorer – but then later checking Wikipedia stats on browser use showed that Chrome was the most used on Wikipedia at the moment anyway – which is my main use case.

I started to look at the Chrome extension documentation. The ‘Getting Started’ tutorial got me up and running pretty quickly, and soon I had an extension running that worked pretty much like a bookmarklet and displayed a list of names from BNB based on a hardcoded VIAF URI. The extensions are basically a set of Javascript files (with some html/css for display), so if you are familiar with Javascript then once you’ve understood the specific chrome API you should find building an extension quite straightforward.

I then started to look at how I could grab a VIAF URI from the current page in the browser, and only show the extension action when one was found. The documentation suggested this is best handled using the ‘pageAction’ call. A couple of examples (Mappy (zip file with source code) and Page Action by content (zip file with source code)) and the documentation got me started on this.

Initially I struggled to understand the way different parts of the extension communicate with each other – partly because the code examples above don’t use the simplest (or most up to date) approaches (in general there seems to be an issue with the sample extensions sometimes using deprecated approaches). However the ‘Messaging’ documentation is much clearer and up to date.

The other challenge is parsing the XML returned from the SPARQL query – this would be much easier if I used some additional javascript libraries – but I didn’t really want to add a lot of baggage/dependencies to the extension – although I guess many extensions must include libraries like jQuery to simplify specific tasks. While writing this I’ve realised that the BNB SPARQL endpoint supports content negotiation, so it is possible to specify JSON as a response format (using Accept: sparql-results+json as per SPARQL 1.1 specification) – which would probably be simpler and faster – I suspect I’ll re-write shortly to do exactly this.

The result so far is a Chrome extension that displays an icon in the address bar when it detects a VIAF URI in the current page. The extension then tries to retrieve results from the BNB. At the moment failure (which can occur for a variety of reasons) just means a blank display. The speed of the extension leaves something to be desired – which means that sometimes you have to wait quite a while for results to display – which can look like ‘failure’ – I need to add something to show ‘working’ status and a definite message on ‘failure’ for whatever reason.

A working example looks like this:

Demonstration of Contemporaneous browser extension

Each name in the list links to the BNB URI for the person (which results in a readable HTML display in a browser, but often not a huge amount of data). It might be better to link to something else, but I’m not sure what. I could also display more information in the popup – I don’t think the overhead of retrieving additional basic information from the BNB would be that high. I could also do with just generally prettying up the display and putting some information at the top about what is actually being displayed and the common ‘year of birth’ (this latter would be nice as it would allow easy comparison of the BNB data to any date of birth in Wikipedia.

As mentioned, the extension looks for VIAF URIs in the page – so it works with other sources which do this – like WorldCat:

Demonstration of Contemporaneous extension working with WorldCat.org

While not doing anything incredibly complicated, I think that it gives one example which starts to answer the question “What to do with Linked Data?” which I proposed and discussed in a previous post, with particular reference to the inclusion of schema.org markup in WorldCat.

You can download the extension ready for installation, or look at/copy the source code from https://github.com/ostephens/contemporaneous

Contemporaneous part one

I recently did a couple of workshops for the British Library about data on the web. As part of these workshops I did some work with the the BNB data using both the API and the SPARQL endpoint. Having a look and play with the data got me thinking about possible uses. One of the interesting things about using the SPARQL endpoint directly in place of the API is that you have a huge amount of flexibility about the data you can extract, and the way SPARQL works lets you do in a single query something that might take repeated calls to an API.

So starting with a query like:

SELECT *
WHERE {
<http://bnb.data.bl.uk/id/person/Bront%C3%ABCharlotte1816-1855> ?p ?o
}

This query finds triples about “Charlotte Brontë”. The next query does the same thing, but uses the fact that the BNB makes (where possible) ‘sameAs’ statements about BNB URIs to the equivalent VIAF URIs:

PREFIX owl:  <http://www.w3.org/2002/07/owl#>
SELECT ?p ?o
WHERE {
?person owl:sameAs <http://viaf.org/viaf/71388025> .
?person ?p ?o
}

This query first finds the BNB Resource which is ‘sameAs’ the VIAF URI for Charlotte Brontë (which is http://bnb.data.bl.uk/id/person/Bront%C3%ABCharlotte1816-1855)  – this is done by:

?person owl:sameAs <http://viaf.org/viaf/71388025>

The result of this query is one (or potentially more than one, although not in this particular case) URI, which are then used in the next part of the query:

?person ?p ?o

In this case, the query is slightly wider in that it is possible that there is more than one BNB resource identified as being the ‘sameAs’ the VIAF URI for Charlotte Brontë (although in actual fact there isn’t in this case).

Taking the query a bit further, we can find the date of birth for Charlotte Brontë:

PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX bio:  <http://purl.org/vocab/bio/0.1/>
SELECT ?dob
WHERE {
?person owl:sameAs <http://viaf.org/viaf/71388025> .
?person bio:event ?event .
?event rdf:type bio:Birth .
?event bio:date ?dob
}

The ‘prefix’ statements are just to setup a shorthand for the query – rather than having to type out the whole URI each time I can use the specified ‘prefix’ as an equivalent to the full URI. That is:

PREFIX bio:  <http://purl.org/vocab/bio/0.1/>
?person bio:event ?event

is equivalent to

?person <http://purl.org/vocab/bio/0.1/event> ?event

Having got to this stage – the year of birth based on a VIAF URI – we can use this to extend the query to find other people in BNB with the same birth year – the eventual query being:

PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX bio:  <http://purl.org/vocab/bio/0.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?dob ?name
WHERE {
?persona owl:sameAs <http://viaf.org/viaf/71388025> .
?persona bio:event ?eventa .
?eventa rdf:type bio:Birth .
?eventa bio:date ?dob .
?eventb bio:date ?dob . 
?eventb rdf:type bio:Birth .
?personb bio:event ?eventb .
?personb foaf:name ?name
}

I have to admit I’m not sure if this is the most efficient way of getting the result I want, but it does work – as you can see from the results. What is great about this query is that the only input is the VIAF URI for an author. We can substitute the one used here for any VIAF URI to find people born in the same year as the identified person (as long as they are in the BNB).

Since VIAF URIs are now included in many relevant Wikipedia articles, I thought it might be fun to build a browser extension that would display a list of ‘contemporaries’ for using the BNB data – partly to show how the use of common identifiers can make these things just fit together, partly to try building a browser extension and partly because I think it is a nice demonstration of the potential uses of library data which we have in libraries but often don’t exploit (like the years of birth/death for people).

But since this post has gone on long enough I’ll do a follow up post on building the extension – but if you are interested the code is available at https://github.com/ostephens/contemporaneous