Discovering Babel

(Martin Wynne presenting) Discovering Babel based on Oxford Text archive – looking at c.1400 metadata records and c.1400 electronic literary and linguistic datasets – electronic texts, text corpora, lexicons, audio data, etc.

Also including British National Corpus and an archive of central and East European language resources (TRACTOR)

Metadata will be made available via TEI (Text Encoding Initiative) XML headers; Dublin Core – but will use DC extensions from the Open Language Archives Community (OLAC); CLARIN Metadata Initiative (CMDI); RDF linked data (‘may as well’!)

Will provide an OAI-PMH target, and will be harvested by OLAC, CLARIN, etc.

Example use cases … want to provide data/metadata and allow others to build services on top of this…
Aim to make it easier for end-users to find and access resources; will also produce a ‘How to make your language resources discoverable’ manual.

Key technical challenges – establishing sensible and standards conformant architecture for resource file locations (persistent URLs) …

OpenART

(presented by Julie Allinson) OpenArt is based on the University of York, working in partnership with the Tate and Acuity Unlimited.

Taking ‘The London Art World 1660-1735’ – an AHRC funded project which has currently produced data in a series of inter-linked spreadsheet – about people, places and sales (or art works) – and also about the art works (where they are currently – expect mainly in Tate Britain) and the source of the information of the sales etc. (bibliographic information)

Can see rich set of inter-relationships – e.g. even between people can have relations like ‘was patron of’ – see all artists with same patronage etc.

Data will be made available via Fedora – will be normalising formats and vocabs. Probably be released in some flavour of RDF. OpenArt will focus on data, but already funding in place to produce ‘end user’ focussed interfaces – which will follow on from OpenArt project.

Example use cases:
Answering research questions lie ‘how many paintings were imported annually into England during this period’
Looking at where art works are now

For institution hope to gain a mechanism for releasing open data

Blog at http://yorkdl.wordpress.com/

Open Metadata Pathfinder

Open Metadata Pathfinder: optimising and enhancing access to AIM25‘ – project based on AIM25 (descriptions of Archives in the M25 area) – 16000 Collection Level descriptions – 2 million hits per month – enquiries from around the world.

AIM25 covers a very wide range of material, of interest to a very wide audience.

Use UKAT (UK Archival Thesaurus) for searching – many users come in via Thesaurus terms rather than keyword search.

Going to take a subset of data – from five institutions that are new members of AIM25 – so starting from clean slate – will be using Linked Data approach – and can therefore test against the existing approach – and compare the approach they have taken to date with Linked Data approach.

Jerome

I’m at the ‘startup’ meeting for JISC ‘Resource Discovery Infrastructure’ projects funded under the JISC 15/10 call

Quick descriptions of project follow, starting with …

Project at the University of Lincoln funded by JISC (Paul Stainthorp presenting)

Jerome is a project to ‘liberate data’ – in essence to build a ‘quick and dirty’ index of data that can be made available openly from the University of Lincoln library. Feel confident can release catalogue records as overall volumes small, and have traditionally invested in local cataloguing.

Where full record can’t be released, can still release fact that something belongs to collection, and can then enhance with data from elsewhere that is open.

Then including repository data and archives data, and where possible e-journal holdings from KnowledgeBase (again, use ISSN to draw in descriptive data from open sources due to issues around releasing descriptive metadata from KnowledgeBase).

Technical architecture is a ‘NoSQL’ index – very very fast. First and foremost point of access for the data is an API. Any services created by the University of Lincoln will use this API (‘eat our own dog food’)

Aiming for ‘radical personalisation’ – what do we know about users – their past activity, their location, the weather(!) etc. – use that to deliver relevant services.

Going for ‘incredibly fast’ – “if we can measure it, it is too slow” – literally looking for response times that they can’t measure

Example service – allow users to produce their own library discovery tools.

Presentation available at http://paulstainthorp.com/2011/03/01/jisc-rdtf-meeting-birmingham-jerome/