Mashing and Mapping

Middlemash, the third Mashed Library event, took place on the 30th November. Hosted by Damyanti Patel (my other half) and her team (Mark, Robin, Chris and John) at Birmingham City Unversity, the day was once again a split between talks and hands on mashing. In some ways I think it may have been the most ‘twitter active’ event I’ve been at so far – there were around 560 tweets tagged with #middlemash on the day itself. Although possibly some of the bigger conferences I’ve been at had more volume, I don’t think any has had the density of ‘tweets per delegate’ ๐Ÿ™‚ There was even an ‘official tweeter’ in the form of @joeyanne. There is an archive of all the tweets at http://twapperkeeper.com/middlemash.

The day started with Tamar Sadeh from Ex Libris (who also sponsored the day) talking about a variety of things including the Ex Libris Code Share wiki – I was really pleased to see that this is accessible to everyone – although only Ex Libris customers can post code .

Following this Mark van Harmelen from HedTek Ltd introduced concepts of rapid prototyping and working with users – stressing the flexibility of paper, pens and post-it notes in the design process, and also the importance of making development a collaborative process.

Then we had three ‘case studies’ from Edith Speller, Paul Stainthorp and Chris Keene – it was great to see some examples of mashing in action from real situations, solving practical problems.

In the afternoon I’d already decided I wanted to pick up something I’d played with briefly at the first Mashed Library event (#mashlib08), which was using the Google Maps interface. I’d sort of volunteered to ‘lead’ a session – which I’m afraid I didn’t do a brilliant job of – not enough preparation I’m afraid – so if you came along I’m sorry about that.

We started with (I think) a good discussion of how Google Maps (and similar systems like OpenMap) work (more on this in a minute), and what the practical issues of maintaining floorplans for the library were – especially where you wanted to be able to indicate where a specific book is. The truth is that locating an item on a specific piece of shelving has not been something that most libraries have bothered to do in the past (certainly on open shelving) – relying instead on a set of ‘rules’ you can follow to work out where a specific book will be – at least, relative to the other books in the library. In theory the item record on the catalogue will give you enough information to find the item – typically the information will include:

  • Library site (for multi-site libraries)
  • Collection (sometimes based on discrete sets of material, but sometimes general geographic locations like ‘First floor’)
  • Loan period (this is sometimes, but not always, linked to a physical location)
  • Classmark or Shelfmark

In well designed modern libraries, you can usually use this information to work out where a book is relatively easily. However, when you sometimes have shelfmarks like “Cupboard S” (real example) basically there is no way of working out where the book is – you just have to ask where “Cupboard S” is.

Of course, books are relatively easy – tracking down a journal volume or item almost always relies on simply knowing how the alphabetical sequence of titles winds its way around a set of shelving (and sometimes where older materials have been shelved in separate, less accessible, shelving).

What is perhaps slightly odd is that most libraries do keep somekind of signing up to date – usually in the form of ‘shelf ends’ which indicate which range of classmarks (or journal titles) is on a specific shelf. However, it seems that these are not usually linked into the library systems at all (although at least one library in the group did record these in a an Access database). One of the issues with this kind of signing, and the general idea of linking an item to a specific shelf, is that for the items that are close to the start or end of a shelf unit, there is a relatively high likelihood they will be slipped ono the previous or next unit as they are reshelved and the amount of stock on the unit changes.

We had some discussion of how libraries might keep track of what books were on which shelf unit more closely – either by scanning the first and last book on a shelf each time, or looking to RFID to help – and Dave Pattern reminded us (over Twitter) that he had blogged an idea of using RFID for this purpose a couple of years ago.

At this point I wanted to see if we could get something done with Google Maps and a library floorplan during the afternoon, and so I wanted to move on this with. While I settled down to this with Rob Styles from Talis, others started to look at what the various requirements were for a ‘library map’ application – which Graham Seaman gathered together and posted on the mashed library wiki – there are some great ideas, and it feels like there is a real application waiting to be specified there.

Back to the maps. Essentially the way the various mapping systems work is to have ’tiles’ which each represent a section of the map. With Google Maps (and I think this is common to other platforms) the tiles are 256 x 256 pixels. This concept of tiling works in conjunction with the ability to zoom in and out of the map. The basic idea is that at maximum zoom out, you fit the entire map on a single 256 x 256 tile. As you zoom in, you double the number of tiles both along the width and height of the map (i.e. the x and y axis). For Google Maps zoom starts ‘0’ (zero) – a single 256 x 256 tile. This means a zoom of ‘1’ is 2 x 2 tiles (i.e. 4 tiles), zoom ‘2’ is 4 x 4 (16 tiles) etc. Much of the documentation I found suggested that Google currently supported zoom up to 17 – but on the day we actually found that it supported zoom up to a value of 21 – and I guess if they ever get more detailed maps or satellite they will support higher levels of zoom. There is more on how tiles work at http://code.google.com/apis/maps/documentation/overlays.html#Google_Maps_Coordinates

Lyn Parker from the University of Sheffield ‘volunteered’ their floorplans (http://library.shef.ac.uk/open/floorplan/plans.html) to be used in our project. So the first job was to create the tiles we needed. I have to admit that I’d thought of this stage as the ‘boring but necessary’ bit – however, looking back on this it is in some ways the most complicated bit as for each level of zoom you want, you need to resize the graphic and cut it into appropriate tiles. Luckily there are already some scripts available to do all this work for you. Even better, Rob had Photoshop on his Mac, and we got a Photoshop ’tiling’ script from Mapki – a wiki about the Google Maps API.

Our original idea had been to create a ‘custom map’ to essentially present the floorplan within the Google Maps interface. However, the tools available generally seemed to be aimed at overlaying information on the ‘real world’ as represented in Google Maps. So, we got slightly diverted at this point, and decided to see if we could insert the Sheffield floorplan over the real building in Google Maps. With some help from Lyn, we found the building on Google Maps and Rob started to manipulate the floorplan image so we could align it with the building on the map.

Although this took us away from the initial idea, we were quite excited by the idea that if we got this right, we would be able to assign real world latitude and longitude to items marked on the floorplan – including shelf-units. There is definitely something satisfying about this idea, although whether it would turn out to be of practical benefit is less clear to me.

As well as re-orienting the floorplan image, we also had to work out where it should display on the Google Map. This, rather frustratingly, involves knowing the numerical identifiers of the actual Google Maps tiles – after some hunting around, the best tool for this turned out to be one provided by Google at http://code.google.com/apis/maps/documentation/examples/tile-detector.html – this allows you to identify both the tile identifiers and the latitude and longitude (which you also need) – although frustratingly you can’t just type in a postcode or lat/long value to get to the location you want. This tool also gives you the ‘zoom’ level – which you also need.

Once you’ve gathered all the relevant information, you can feed it into the tile cutter – including the number of ‘zoom’ levels you want to produce tiles for. Having done this we finally needed to write a web page to display the google map, with our new tiles integrated into the dispay. This involves using the Google Maps API, and I cannibalised the example script at http://econym.org.uk/gmap/example_custommap3.htm by Mike Williams (whose tutorial at ย http://econym.org.uk/gmap/ I found reasonably useful throughout the exercise).

With various adjustments as we reached the end of the afternoon (moving from using jpg images to png for the tiles, so we could create a transparency effect), and some minor adjustments by me after the event, we got a map up and working at http://www.meanboyfriend.com/mashedlib/mapping/maps.html

It’s pretty obvious we didn’t quite manage to align the map properly ๐Ÿ™‚ We did find some tools that are mean to help with this – but they didn’t always seem to work, and some links were just dead. I guess a little more investigation – or some trial and error – would get this solved. However, I’m pretty pleased with what we got done in limited time – thanks to Rob working with me on this.

Actually I think we did one of the hardest things we could have picked to be honest. Looking at it again now, and perhaps understanding it a bit more in retrospect, I think we could have assigned arbitrary tile numbers if we had simply wanted to achieve a Google Maps interface to the floorplan – and it looks to me like then doing overlays on this would have been pretty straightforward as well – when I get a chance I’ll try and test this theory! I really like the idea of the ‘heatmap’ for library stock usage (first suggested by Amy Hadfield at Mash Oop North) and would like to get a demonstration of this running.

So – a great day’s mashing – thanks to all at Birmingham City University who organised and ran the day, and everyone who came along and made it such fun.

Middlemash, Middlemarch, Middlemap

The next Mashed Library event was announced a few months ago, but now more details are available. Middlemash is happening at Birmingham City University on 30th November 2009. I hope to see you there.

In discussion with Damyanti Patel, who is organising Middlemash, we thought it would be nice to do a little project in advance of Middlemash. When we brainstormed what we could do I originally suggested that maybe someone had drawn a map of the fictional geography of Middlemarch, and if we could find one, we could make it interactive in some way. Unfortunately a quick search turned up no such map. However, what it did turn up was something equally interesting – this map of relationships between characters in Middlemarch on LibraryThing.

This inspired a new idea – whether this could be represented in RDF somehow. My first thought was FOAF, but initially this seemed limited as it doesn’t allow for the expression of different types of relationship. However, I then came across this post from Ian Davis (this is the first in a series of 3), which used the Relationship vocabulary in addition to FOAF to express more the kind of thing I was looking for.

The resulting RDF is at http://www.meanboyfriend.com/overdue_ideas/middlemash.rdf. However, if you want to explore this is a more user-friendly manner, you probably want to use an RDF viewer. Although there are several you could use, the one I found easiest as a starting point was the Zitgist dataviewer. You should be able to browse the file directly with Zitgist via this link. There are however a couple of issues:

  • Zitgist doesn’t seem to display the whole file, although if you browse through relationships you can view all records evenutally
  • At time of posting I’m having some problems with Zitgist response times, but hopefully these are temporary

This is the first time I’d written any RDF, and I did it by hand, and I was learning as I went along. So I’d be very glad to know what I’ve done wrong, and how to improve it – leave comments on this post please.

I did find some problems with the Relationship vocabulary. It still only expresses a specific range of relationships. It also seems to rely on inferred relationships in some cases. The relationships uncle/aunt/nephew/niece aren’t expressed directly in the relationship vocabulary – presumably on the basis that they could be inferred through other relationships of ‘parentOf’, ‘childOf’ and ‘siblingOf’ (i.e. your uncle is your father’s brother etc.). However, in Middlemarch there are a few characters who are described as related in this manner, but to my knowledge no mention of the intermediary relationships are made. So we know that Edward Causubon has an Aunt Julia, but it is not stated whether she is his father’s or mother’s sister, and further his parents are not mentioned (this is as far as I know, I haven’t read Middlemarch for many years, and I went from SparkNotes and the relationship map on LibraryThing).

Something that seemed odd is that the Relationship vocabulary does allow you explicitly to relate grandparents to grandchildren without relying on the inferrence from two parentOf relathionships.

Another problem, which is one that Ian Davis explores at length in his posts on representing Einsteins biography in RDF is the time element. The relationships I express here aren’t linked to time – so where someone has remarried it is impossible to say from the work I have done here whether they are polygamous or not! I suspect that at least some of this could have been dealt with by adding details like dates of marriages via the Bio vocabulary Ian uses, but I think this would be a problem in terms of the details available from Middlemarch itself (I’m not confident that dates would necessarily be given). It also looked like hard work ๐Ÿ™‚

So – there you have it, my first foray into RDF – a nice experiment, and potentially an interesting way of developing representations of literary works in the future?

Scraping, scripting and hacking your way to API-less data

Mike Ellis from eduserv talking about getting data out of web pages.

Scraping – basically allows you to extract data from web pages – and then you can do stuff with it! Some helpful tools for scraping:

  • Yahoo!Pipes
  • Google Docs – use of the importHTML() function to bring in data, and then manipulate it
  • dapper.net (also mentioned by Brendan Dawes)
  • YQL
  • httrack – copy an entire website so you can do local processing
  • hacked search – use Yahoo! search to search within a domain – essentially allows you to crawl a single domain and then extract data via search

So, once you’ve scraped your data, you need some tools to ‘mung’ it (i.e. manipulate it)

  • regex – regular expressions are hugely powerful, although can be complex – see some examples at http://mashedlibrary.ning.com/forum/topics/extracting-isbns-from-rss
  • find/replace – can use any scripting language, but you can even use Word (I like to use Textpad)
  • mail merge (!) – if you have data in excel, or access, or csv etc. you can use mail merge to output with other information – e.g. html
  • html removal – various functions available
  • html tidy – http://tidy.sourceforge.net – can chuck in ‘dirty’ html – e.g cut and pasted from Word, and tidy it up

Processing data:

  • Open Calais – service from Reutersย  that analyses block of text for ‘meaning’ – e.g. if it recognises the name of a city it can give information about the city such as latitude/longitude etc.
  • Yahoo!Term Extraction – similar to Open Calais – submit text/data and get back various terms – also allows tuning so that you can get back more relevant results
  • Yahoo!geo – a set of Yahoo tools for processing geographic data – http://developer.yahoo.com/geo

The ugly sisters:

  • Access and Excel – don’t dismiss these! They are actually pretty powerful

Last resorts:

Somewhere I have never travelled

This presentation by Brendan Dawes – http://www.brendandawes.com/ (powered by WordPress)

Brendan quite into data – “data porn” – visualising data. Saying that much of the web is still designed as if it’s in print.

Making ‘weird creatures’ out of keywords http://www.brendandawes.com/?s=redux – ‘creatures’ size indicates popularity, speed they move depends on age – but this stuff doesn’t come with an instruction manual – there is nowhere that these links between data and behaviour is documented for the ‘end user’ – but just putting it out there, and trying it out.

‘Interfaces’ are important – Brendan likes to collect ideas in ‘Field Notes’ books – http://fieldnotesbrand.com/. Also has a firewire drive full of ‘doodles’ as his ‘digital notebook’ – just bits and pieces of stuff that may do one thing – e.g. a drawing app, that allows you to draw things in black ink – that sat there for ages, he did nothing with it. Then had an idea that he wanted to be able put stuff on lines that he had drawn – found something that someone else had done online – and he had put that on his digital notebook.

Brendan wanted to do something http://www.daylife.com/

(aside – When you design stuff for people, avoid colours – as people can dump a perfectly good idea if you’ve done it in the wrong colour! Use black and white, because it doesn’t upset anyone ๐Ÿ™‚

What would happen if we removed interfaces completely? Allowed people to build their own interface?

So – all of these bits and pieces came together as http://doodlebuzz.com/ – allows you to do a search – then you draw a line to see the results displayed.

Memoryshare – a BBC project to share memories. Original version had a rather dull interface – didn’t engage people, so not very good usage – although the content is very compelling when you start reading. Brendan and team did a range of prototypes – very open brief – basically do anything you want.

Took ideas done with the Daylife example – displaying time based events on a spiral line – great ‘wow’ moment when you see the spiral on the screen, and then as you zoom in it becomes obvious that it is a 3d environment – very, very pretty! Original demo was in Flash, which couldn’t cope with the amount of data in memoryshare – but the BBC really liked design, so figured out how to do it – see the results at http://www.bbc.co.uk/dna/memoryshare/ – compare this to the old design at the Internet Archive Wayback Machine.

Brendan now moving onto using data to produce physical objects – mentioned a site I didn’t get (Update: thanks to @nicoleharris got this now http://www.ponoko.com/make-and-sell/how-to-make) that allows you to upload a design and get it made – so for example Brendan has had some wooden luggage tags made with data displayed on them. Moo.com has an API – you can pump data in and get physical objects out. Brendan has written something that takes data from wefeelfine.org and pushes to moo.com to make cards – transfers transient digital data into less transient physical data

Visualisation

Iman Moradi is talking about how we organise library stock and spaces – he’s going through at quite a pace, so very brief notes again.

Finding things is complex

It’s a cliched that library users often remember the colour of the book more than the title – but why don’t we respond to this? Organise books by colour – example from Huddersfield town library.

Iman did a demonstrator – building a ‘quotes’ base for a book – use a pen scanner to scan chunk of text from book, and associate with book via ISBN – starts to build a set of quotes from the book that people found ‘of interest’

Think about libraries in terms of games – users are ‘players’, the library is the ‘game environment’. Using libraries is like a game:

  • Activities = Finding, discovery, collection
  • Points/levels = acquiring knowledge

Mash Oop North

Today I’m at Mash Oop North aka #mashlib09 – and kicking off with a presentation from Dave Pattern – some very brief notes:

Making Library Data Work Harder

Dave Pattern – www.slideshare.net/daveyp/

Keyword suggestions – about 25% of keyword searches on Huddersfield OPAC give zero results.
Look at what people are typing in the keyword search – Huddersfield found ‘renew’ was a common search term – so can pop up a information box with information about renewing your books.

By looking at common keyword combinations can help people refine their searches

Borrowing suggestions – people who borrowed this item, also borrowed …
Tesco’s collect and exploit this data. Do libraries sometimes assume we know what is best for our users – but we perhaps need to look at data to prove or disprove our assumptions

Because borrowing driven by reading lists, perhaps helps suggestions stay on-topic

Course specific ‘new books’ list – based on what people on specific courses borrow
Able to do amazon-y type personalised suggestions

Borrowing profile for Huddersfield – average number of books borrowed shows v high peak in October, lull during the summer – now can see the use of the suggestions following this with a peak in November.

Seems to be a correlation between introduction of suggestions/recommendations with increase in borrowing – how could this be investigated further?

Started collecting e-journal data via SFX – starting to do journal recommendations based on usage.

Suggested scenario – can start seeding new students experience – 1st time student accesses website can use ‘average’ behaviour of students on same course – so highly personalised. Also, if information delivered via widgets could drag and drop to other environments.

JISC Mosaic project, looking at usage data (at National level I think?)

So – some ideas of stuff that you might do with usage data:

#1 Basic library account info:
Just your bog standard library optionss
– view items on loan. hold requests etc
– renew items
Confgure alerting options
– SMS, Facebook, Google Teleppathy
Convert Karma
– rewards for sharing information/contributing to pool of data – perhaps swap karma points for free services/waiving fines etc.

#2 Discovery service
Single box for search

#3 Book recommendations
Students like book covers
Primarily a ‘we think you might be interested in’ service
Uses database of circulation transactions, augmented with Mosaic data
time relevant to the modules student is taking
Asapts to choices student makes over time

#4 New books
Data-mining of books borrowed by student on a course
Provide new books lists based on this information (already doing this at Huddersfield I think)

#5 Relevant Journals

#6 Relevant articles
– Whenever student interacts with library services e.g. keywords etc. – refines their profile