Dutch Culture Link

This session by Lukas Koster.

Works for Library of the University of Amsterdam was ‘system librarian’, then Head of Library Systems Department, now Library Systems Coordinator – means responsible for MetaLib, SFX … and new innovative stuff – including mobile web and the Dutch Culture Link project – which is what he is going to talk about today.

Lukas is a ‘shambrarian’ – someone who pretends they know stuff!

Blogs at http://commonplace.net

Lukas described situation in Netherlands regarding libraries, technology and innovation. How much leeway to get involved in innovation and mashups – depends very much on individual institutions and local situation. Large Library 2.0 community – but much more in public libraries – especially user facing widgets and UI stuff – via a Ning network. Have ‘Happe.Nings’ – but more looking at social media etc. rather than data mashups. Lukas blogged the last one at http://www.lukaskoster.net/2010/06/happe-ning-in-haarlem. Next Happe.Ning about streaming music services

Lukas talking about Linked Open Data project – partners are:

  • DEN – Digital Heriatage Foundation of the Netherlands – digital standards for heritage institutions, promoting linked open data – museums etc. – simple guidelines how to publish linked open data
  • UBA – library of the University of Amsterdam
  • TIN – Theater Institute of the Netherlands

Objectives of project:

  • Set example
  • Proof of concept
  • Pilot
  • Convince heritage institutions
  • Convince TIN, UBA management

Project called “Dutch Culture Link” – aim to link cultural data and institutions through semantic web

Linked data projects – 2 viewpoints – publishing and use – no point publishing without use. Lukas keen that project includes examples of how the data can be used.

So – the initial idea is that the UBA (Aleph) OPAC will be used to get data published from the TIN collection and enhance OPAC

TIN use AdLib library system (AdLib also used for museums, archives etc.) – TIN contains objects and audio-visual material as well as bibliographic items

Started by modelling TIN collection data model – entities:

  • Person
  • Part (person plays part in play)
  • Appearance
  • Production
  • Performance
  • Location
  • Play

Images, text files, a-v material related to these entities – e.g. Images from a performance

Lukas talking about FRBR – “a library inward vision” – deals with bibliographic materials – but can perhaps be mapped to plays…

  • Work = Play
  • Expression = Production?
  • Manifestation = Production?
  • Item = Performance (one time event)

FRBR interesting model, but needs to be extended to the real world! (not just inward looking for library materials)

Questions that arise:

  • Which vocabulary/ontology to use?
  • How to implement RDF?
  • How to format URIs?
  • Which tool, techniques, languages?
  • How to find/get published linked data?
  • How to process retrieved linked data?

Needed training – but no money! Luckily were able to attend free DANS Linked Open Data workshop

Decided to start with a quick and dirty approach:

  • Produced URIs for data entities in TIN data – expressed data as JSON (not RDF)
  • At the OPAC end:
  • Javascript: construct TIN URI
  • Process JSON
  • Present data in record

URI:

  • <base-url>/person/<personname>
  • <base-url>/play/<personname>/<title>
  • <base-url>/production/<personname>/<title>/<opening>

e.g. URI <base-url>/person/Beckett, Samuel returns JSON record

So, in OPAC, find the author name – form URI from information in MARC record – but strip out any extraneous information. Get JSON and parser with javascript, and display into OPAC.

But – this not Linked Data yet – not using formal Ontology, not using RDF. But this is the approach – quick and dirty, tangible results

Next steps: at ‘Publishing’ end

  • Vocabulary for Production/Performance subject area
  • Vocabulary for Person (FOAF?), Subject (SKOS?)
  • RDF in JSON (internal relationships)
  • Publish RDF/XML
  • More URIs – form performances etc
  • External links
  • Content negotiation
  • Links to a-v objects etc.

At ‘use’ end:

  • More ‘search’ fields (e.g. Title)
  • Extend presentation
  • Include relations
  • Clickable
  • More information – e.g. could list multiple productions of same play from script in library catalogue

Issues:

  • Need to use generic, really unique URIs
  • Person: ids (VIAF?)
  • Plays: ids

What the internet can learn from libraries?

This session by Dan W (http://www.iamdanw.com/) who is the resident Creative Technology Research Associate at Pervasive Data Studio. Apologies for the bitty nature of these notes…

Example George Melliere – using film to create ‘magic’ – early form of special effects.

Example from Paper Camp – newspaper created from blog posts. Cheap to print because decline in newspaper printing means lots of spare capacity in printers.

The Nature of Technology – what it is and how it evolves – by W. Brian Arthur. Argues that this happens through transferring technologies between domains (re-domaining) – applying things that are created in one domain to a different area.

Most of the time when you write software and you find a problem – someone else has already solved it – about taking tech from one place and applying in another.

Dan tells story of how Twitter grew out of hack week – company was actually originally setup to build list of all podcasts.

In biology this repurposing happens as well – feathers probably developed to keep warm, not flying.

Dan doesn’t know about libraries – so interested in finding things from libraries and applying to Internet. Libraries used to loaning stuff.

Dan loves ‘tangible browsing experience’ in library – different to internet…

Papercamp and Bookcamp – about bringing ‘internet people’ to look at paper/books etc.

Example of calcwars – book created from twitter stream of debate between Newton and Leibniz of who invented the calculus

Libraries good at preservation – Internet rubbish at preservation – e.g. Geocities

RFID – becoming more and more common in libraries – but for incredibly boring purposes! Dan thinks we can do more, better, possibly sillier, things with it. Example visualisation of RFID tags moving around space

RFID tends to store identifiers – so need to dig into data stored against those identifiers in other systems

Example of using Oyster Card to show percentage of tube stations visited.

RFID radios – physical representation of an album!

@GusAndPenny – cats with RFID tags – tweets when cats go through catflap

Experiment in Holland of allowing people to classify books by putting on specific shelves – doesn’t work because of people – turns out they just put books back anywhere!

Open Bibliography (and why it shouldn’t have to exist)

Today I’m at Mashspa – another Mashed Library event.

Ben O’Steen is talking about a JISC project he is currently involved with. Project about getting bibliographic information into the open. For Ben Open means “publishing bibliographic information under a permissive license to encourage indexing, re-use and re-purposing”. Ben believes that some aspects – such as attribution – should be part of ‘community norm’, not written into a license.

In essence an open bibliography is all about Advertising! Telling other people what you have.

Bibliographic information allows you to:

  • Identify and find an item you know you want
  • Discovery related items or items you believe you want
  • Serendipitously discover items you would like with knowing they might exist
  • …other stuff

This list (from top to bottom) require increasing investment. Advertising isn’t about spending money – it’s about investment.

To maximise returns you maximise the audience

Ben asks “Should the advertising target ‘b2b’ or ‘consumers’?”

Ben acknowledges that it may not be necessary to completely open up the data set – but believes that in the long term open is the way forward.

Some people ask “Can’t I just scrape sites and use the data – it’s just facts isn’t it?”. However Directive 96/9/EC of the European Parliament which codifies a new protection based on “sui generis” rights – rights earned by the “sweat of the brow”. So far this law seems to have only solidified existing monopolies – not generated new economic growth (which was apparently the intention of the law)

When project asked UK PubMedCentral if we could reproduce the bibliographic data they share through their OAI-PMH service? – they said ‘Generally, No’ – paraphrasing that basically UK PubMedCentral said they didn’t have the rights to give away the data (except the stuff from Open Access journals) – NOTE – this is the metadata not the full text articles we are talking about – they said they could not grant the right to reuse the metadata [would this, for example, mean that you could not use this metadata in a reference management package to then produce a bibliography?]

Principles:

  • Assign a license when you publish data
  • Use a recognised license
  • If you want your data to be effectively used and added to by other it should be open – in particular non-commercial and other restrictive licenses should be avoided
  • Strongly recommend using CC0 or PDDL (latter in the EU only)
  • Strongly encourage release of bibliographic data into the ‘Open’

Sliding scale:

  • Identify – e.g. for author simple identifier could just be name – cheap, more expensive identifiers – e.g. URIs or ORIDs
  • Discover –
  • Serendipity –

If you increase investment you get more use – difficult to reuse data without identifiers for example.

1. Where there is human input, there is interpretation – people may interpret standards in different ways, use fields in different ways

Ben found a lot of variation across data in PubMed data set – different journals or publishers interpret where information should go in different ways – “Standards don’t bring interoperability, people do”

2. Data has been entered and curated without large-scale sharing as a focus – lots of implicit, contextual  information left out – e.g. if you are working in a specialist Social Science library, perhaps you don’t mention that the item is about Social Sciences as that is implicit by (original) context

3. Data quality is generally poor – example from the BL ISBN = £2.50!

In a closed data set you may not discover errors – when you have lots of people looking at data (with different uses in mind) you pick up different types of error.

The data clean-up process is going to be PROBABALISTIC – we cannot be sure – by definition – that we are accurate when we deduplicate or disambiguate. Typical methods:

  • Natural Language Processing
  • Machine learning techniques
  • String metrics and old school record reduplication – easiest of the the 3 (for Ben)

Not just about matching uniquely – looking at level of similarity and making decisions

List of string metrics at http://staffwww.dcs.shef.ac.uk/people/s.chapman/stringmetrics.html

Felligi-Sunter method for old school deduplication – not great, but works OK .

Can now take a map-reduce approach (distribute processing across servers)

Do it yourself:

When de-duping – need to be able to unmerge so you can correct if necessary – canonical data that you have, and data that you publish to the public

Directions with Bibliographic data: So far much effort has been directed at ‘Works’ – we need to put much more effort into their ‘Networks’ – starts to help (for example) disambiguate people

Network examples:

  • A cites B
  • Works by a given author
  • Works cited by a given author
  • Works citing articles that have since been disproved, redacted or withdrawn
  • Co-authors
  • …other connections that we’ve not even thought of yet

Ben says – Don’t get hung up on standards …

A Rose by any other name

Amanda Hill talking about ‘a new approach to name authority’

The ‘Names’ project – JISC funded project based at Mimas to investigate and build a prototype/pilot system to deal with name authority.

What’s the problem? Amanda talking about how her father bought books from an Amazon Wishlist – thinking it was her’s, but actually turned out to be another Amanda Hill who happened to live in the same town. More serious examples – someone operated on in error because of a name mix up.

Amanda gives a (fictitious) example of how many different variations on names can appear in academic publications. From a real example: [edit: for clarity Amanda gave a fictitious example I missed, and then followed that with a real example which was taken from this blog post, and this real example is the one given here]

  • M. Lynne Murphy
  • Lynne M. Murphy
  • L. M. Murphy

All same person…

So names can be confusing. You may want to retrieve all articles by a particular individual – may have to think of all forms of the name, and exclude people with the same name.

Zetoc is a service from the British Library and Mimas which contains information about journal articles – and so millions of names of active researchers – but not complete names always, because generally initials only used in academic publication.

So used Zetoc to pre-populate the Names system – assign unique identifiers – then start to enhance from other places.

Currently project is testing matching algorithms, reviewing data structure, updating data mappings for other standards (MARC, CERIF, etc.), collaborating with potential data providers and Names users.

You can search the pilot system at http://names.mimas.ac.uk/advanced-search.php Amanda doing demonstration – e.g. Abbott, S – note that the identifiers currently assigned are not persistent – but there will be persistent identifiers once they finished tweaking data/algorithms/etc. Can also see relationships to others (based on co-publication I think) – e.g. http://names.mimas.ac.uk/individual/60767.html?outputFields=collaborativeRelationships

In the long term want to allow researcher to edit their own information (e.g. specified a preferred name).

Names project not the only activity in this area – also:

Q: How far are names matched automatically – is there any human intervention

A: Currently completely automated – but checking how accurate this is. Also erring on side of caution – so if any doubt they don’t merge identities. Matching will improve as they get more data.

Q: What about dealing with non-english script names?

A: Names project focussed on the UK research community, but Zetoc data not limited to this, and system designed to cope with different versions of names (and name changes)

Connecting with Scholars

Terence Huwe from the University of California (Berkley) opening this session – going to talk about how ‘faculty’ (academic staff) see the research library, and how we (the librarians) see faculty.

Terence is director of the library at the Institute for Research on Labor and Employment – the library is the ‘digital publisher’ for the institute. Faculty study by Ithaka in 2009 – at http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty-survey-2009 – tells us about faculty and their attiudes to information/libraries

More faculty members are conducting research at the ‘network level’ – using internet resources as a place to start research, with a growing preference for online/full-text. However faculty members slower to adopt these things that librarians, and also science faculty lead the way – before social science/humanities/arts scholars.

A complex world requires new approaches…

Scholars are responding by increasing the amount they work in groups and consult with each other, more aware of Grey literature  – pre-publication work of high interest, working papers etc. Increasingly aware that others have the skills to search an increasingly complex information environment (i.e. turning to librarians) – but may still have trouble admitting this to themselves and others! We need to help them with this 🙂

What does this mean for libraries? This is an opportunity to reframe our image strategically:

  • save time by assisting discovery
  • educate and form research partnerships
  • to offer interpretive services

[this sounds in some ways very much like a shift from librarian -> information scientist to me? More like information scientists might work in a business environment?]

Move the library away from ‘a place you go’ to a service.

New learning spaces – at UC Berkeley now have bSpace – scalable campus teaching portal – based on Library a la Carte – open source, Sakai platform. As Ithaka research suggests, adoption by faculty is slow, but it is happening.

Terence stresses need to act upon locally acquired wisdom about user behaviour. Outreach is powerful (that means talking to people! Often 1-1), and now you can do effective outreach online.

It is important to monitor the environement for new roles – digital publisher etc.

Taking the Library to the Learner pt 2 – NTNU Library

Ruruk Greenall (@brinxmat) talking about work done at NTNU library (a technical university)

Starting off with information gathered from various surveys – found high use of articles, and that books were not so interesting to the library users – especially not when not available in electronic format.

Found from statistics that 35% of library purchases not borrowed at all. Worked out it would be cheaper to get students to buy each book they needed and get the library to pay for it, than to go to the expense of purchasing, cataloguing and shelving the book.

So – decided to look at how the library subscribed resources appeared in Google – looked at over 300 services and found that out of 333 services, only 6 had content that was not discoverable via Google. So in general can use Google, rather than relying on these subscribed services (for search – not necessarily content). One student comment on the library blog was;

What is the use of teaching students to use databases that are closed to them when they finish their studies

So – rather than try to replicate Google, rather look at the gaps between Google and the library and just look to fill these gaps. For example they created a Linked Data version of MetaLib information and created browser toolbar. Also created a Linked Data representation of SFX (and created a display that includes Sherpa/Romeo status – great idea)

Taking the Library to the Learner pt 1 – Summon

This session from Hannah Whaley and Dave Pattern – talking about their selection and implementation of Summon at Dundee and Huddersfield respectively.

Hannah describes how the amount of information being published as increased at a phenomenal rate – big figures but hard to really get head round how much information is now being published on a daily basis.

At Dundee they tried to model the problem. Looked at their learners – thinking in terms of Marc Prensky’s Digital Natives, and ‘Twitch Speed – complete shift in attention span; internet becomes extension of self storing knowledge and thoughts.

In HE we have responded to these challenges is the use of ‘elearning’ – via VLEs. In the library, adopted ‘Federated search’ – but real challenges – slow; inaccurate; substantial hardward requirement – felt like an old system forced electronic – time for ground up redesign says Hannah.

However challenges – Complexity of eresources (many sources via many platforms), accuracy of information literacy (do students know what they are looking for? Is information accurate?), but need to do this while keeping close contact with the students.

Summon – new product from Serials Solutions – offers webscale discovery [what does this mean in this context?]. Single search box – which they have use to integrate into many environments – in library web pages, in VLE, on mobile platforms.

Practical example – took 140 Environmental Science 1st year undergraduates:

  • 2 weeks of 2 hour practical labls
  • Introduced to Summon and asked to research topics both on Summon and on the open web and keep notes on what they found where
  • Results showed that ‘general’ information better on open web, but ‘academic’ content on Summon (students were asked to rate)
  • While found that higher proportion of searches on Summon needed ‘refining’, refinement on Summon resulted in more improvement in results set than refining on open web search interfaces

About finding a balance between easy/accurate information, with appropriate support.

Still challenging to ‘take the library to the learner’ – technologically and culturally – but making progress.

Now Dave Pattern – talking about implementation of Summon at the University of Huddersfield.

Huddersfield implemented ‘MetaLib’ federated search solution in 2006 – but didn’t fulfil promise. Noted a huge increase in use of Google Scholar in place of library systems, so decided need to do something. Drew up a ‘wish list’ of what they really wanted:

  • Single search box for all (really all) library content
  • Very fast results (<1 second, no federated search)
  • Clean and simple interface
  • Easy to maintain

Invited vendors to come to pitch products/solutions against this list.

Summon was the product that they were impressed with (and could deliver immediately, whereas some other products still in development)

Implemented gradually – had to transition from existing systems

Actually quite easy to implement – some stuff to deal with – e.g. MARC21 mapping, working out daily uploads of information from local information sources (in this case Library catalogue), dealing with deletions

“Summon (or other similar products – Aquabrowser, VuFind etc.) will highlight all your crappy cataloguing” – either through bad or inconsistence practice or copy cataloguing errors. This will apply to any library – doesn’t matter how good your cataloguing is, these products tend to expose the problems.

Generally found students liked it – although some criticisms as well (e.g. ‘too much like Google’)

Implementation of Summon at Huddersfield and at Northumbria has been documented as part of a JISC project – see http://library.hud.ac.uk/blogs/summon4hn/ for more details.

Getting real about social media

Start of the second day of Internet Librarian International, and Hazel Hall (Director of the Centre for Social Informatics at Edinburgh Napier University) is going to talk about “relevance of social tools for information professionals” (slides at http://www.dcs.napier.ac.uk/~hazelh/esis/Hall_ILI.ppt)

Hazel says that we need to recognise that what we aren’t going to do is as important as what we are going to do.

Hazel relates how social media informs her – to know that her sister caught a salmon last Friday, meet up with people on trains, important dates, what’s happening in various organisations, the birth of a friends baby and the death of a colleague.

Social media are aggregators of people (not data or information alone) – their lives and experiences – and this is true for librarians and information professionals of course as well as everyone else. Hazel says that librarians and information professionals are good at organising and commmunicating information – and so natural for us to use social media as an extension to the ways we have done this traditionally. However, we may not be so good at engaging with library stakeholder communities in a participatory, collaborative fashion – and understanding how we can use social media to help with this.

Hazel talking about how it was possible to use a blog post to propose a physical meeting – impromptu coffee mornings http://www.facebook.com/EdCMers – and this draws from an extremely wide set of people – who might not otherwise usually meet or interact. Hazel says new knowledge happens at boundaries – and by bringing these people together you explore the boundaries.

Hazel says that even after 500 years we really don’t full understand the impact of the introduction of the printing press. She mentions the ‘Gutenberg parenthesis’ – that the limited amount of printed information available made us believe that it was normal to codify information – but that looking back on this in the future we will see this as the anomaly in human history.

Hazel mentions the 5 stages of twitter acceptance:

  • Denial – ‘twitter is stupid’
  • Presence – ‘I don’t get it but I feel I ought to do it’
  • Dumping – ‘I’ll just advertise stuff like blog posts etc.’
  • Conversing – authentic 1-1 conversations

“I’m using Twitter to publish useful information that people read, and to converse 1×1 authentically” – this is where the true value of microblogging lies.

People use Twitter in different ways – e.g. Phil Bradley uses his account very differently to Hazel. Hazel keeps her truly ‘personal’ interaction for Facebook, as Twitter (for her) always has an aspect of her professional face. (Hazel feels this is perhaps because Phil is an independent consultant, and so always representing himself – I’m not sure I agree, I suspect it is just more about what you are happy to publish to certain audiences – Phil is simply ‘more public’ than Hazel would be my conclusion I think).

Hazel relates how librarians and information professionals have been using social media for staff development, professional communication (e.g. in place of email lists), profile raising – tweeting and blogging, peer-review work.

Hazel moving onto the use of social media for the delivery of library services. Research snapshot shows that in this cas social media is used largely to deliver the same services in a different way – and noting that libraries are often ahead of the game in comparison to the rest of the organisation – however still very much a ‘we publish’ and ‘you consume’ model – Hazel asking where the user participation element is.

There are example – using a blog to build on traditional services and engage library members through discussion in comments, recommendations etc.

“We are all part of the reality: develop our users, develop ourselves” –

In his PhD Umar Ruhi (http://www.umar.biz) describes how users move through stages of use:

Consume -> connect -> canvas -> communicate -> comment -> commentate -> contribute ->collaborate

We need to develop stakeholder participation – lead communities. Hazel relates how at Napier they’ve used Yammer to do student support – students used it because it was useful and because humans like making and sharing things.

Hazel believes libraries should be following ‘end users’ (library members) via social media – this isn’t to say you’ll see every single tweet, but that you have that direct contact. She also notes that this isn’t necessarily about Twitter – your users more likely to be on Facebook.

Q: Thinking about appropriateness – is it possible to manage personal/professional identity

A: Can’t necessarily manage others – but need to be aware of what we do ourselves – no real answers at the moment

The Library laboratory

This session from Nils Pharo from the Faculty of Journalism, Library and Information Science at Oslo University College. Nils is not ‘hardcore technical’!

The Library laboratory is a project – website at http://www.biblab.no/ – in Norwegian. It has been running for 4 years, funded by Nowegian Archive, Library and Museum Authority (ALM). Initialised by Thomas Brevik – about connecting people in Norway who had an interest in libraries and technology.

Library laboratory supports:

  • net-based meeting places
    • blog
    • wiki (including terminology explanations etc.)
  • physical meetings – annual workshops attended by people from across library sector (public, academic, specialist)
  • projects/prototype development
    • guidance
    • “micro-funding”

Results:

  • Established network of enthusiasts
  • Intitiated other projects financed by ALM (including ‘Pode‘)
  • Now buidling infrastructured to support existing projects
  • ‘Open Library’ provides a solution for merging data from system providers, librarians and library users

The ‘open library’ system is now up and running and being tested with data from the Oslo Public Library (Pode Project)

SWOT analysis of the library laboratory

Strengths:

  • Engaged a passionate group
  • Technological infrastructure

Weaknesses:

  • Small community scattered across many environments
  • Lack of resources to dedicate time to system development

Opportunities

  • National library invests in new union catalogue system – will require a rethink of how library data is managed – may be able to build on work already done for ‘open library’ – opportunity to work with National Library

Threats

  • Many library systems with different interests in Norway
  • Lack of identifiers – and each community/system has it’s own identifiers

Q: What are you lacking when it comes to identifiers?

A: e.g. no identifiers for authors; identifiers for records (different on each system). Current work at the National Library to improve authority identifers

Use of Microsoft LiveLabs Pivot in a Library

This session by David Kane from the Waterford Institute of Technology.

LiveLabs Pivot is a product designed to “interact with massive amounts of data in ways that are powerful, informative and fun” (quote from Microsoft) or “making sense out of mountains of data” (MIT Technology Review). Initially worked via a dedicated viewer, but now can be viewed in a browser.

Pivot allows you to drill-down to specific items using faceted browsing – David demonstrates this with this collection of cars in Pivot provided by Microsoft (you need Silverlight installed to view this).

At the Waterford Institute they used Pivot as a data analysis tool:

  • Downloaded data about 5,000 most ‘in demand’ books from the library system (i.e. most under pressure rather than necessarily most borrowed – that is a popular book, but where there were many copies in the library would not be included)
  • Then allocated colours to sets of books depending on level of demand (green through to red)

Gives way of visualising demand – and make case for additional copies etc.

David mentions ‘Google Refine‘ which also allows visualisations of data.

David shows another Pivot visualisation – looking at how books are distributed across different locations and library branches.

What did he use to create the visualisations?

  • Microsoft PivotViewer
  • Pivot Collection tool for Excel
  • Miscosoft Visual Studio 2010 (for Silverlight)
  • Silverlight 4 tools for visual studio
  • Silverlight 4 toolkit

David shows online example at http://researchscope.net/pivot/beastquest/index.php – created using a Z39.50 connection to the catalogue and some PHP processing.

Q: How could it deal with multiple languages?

A: Can generate text in appropriate language ‘on the fly’ and overlay on images. Also Sliverlight has some ability to provide text labels in appropriate languages

Q: Interested in work on ‘high demand’ items – does it have an application to ‘reading lists’?

A: Good question – reading lists are something they are looking at currently in general, but haven’t considered how Pivot would be used.

Q: Can you use Pivot to incorporate ebooks?

A: Absolutely no reason why not. But if you wanted to merge collections that problem would need solving first – Pivot not doing this for you. But could use colour coding to differentiate ebooks from print books and so easily visualise where you have ebook coverage etc.