CORE (COnnecting REpositories) (Presented by Petr Knoth from Open University)
Working with content and metadata from Open Access Institutional Repositories – approx 167 repositories in the UK. Mainly interested in Full-text items (approx 10 percent of metadata records in repositories have full-text items attached).
Will use OAI-PMH to harvest metadata, and then use to grab the pdf (or other full-text) representations of resource. Will then analyse content, and find ‘similarities’ between items – and then express as RDF. Will then make available via triple store.
Have started working with the Open University repository (ORO) – finding about 30% have full-text. Will focus on extracting relationships – specifically ‘semantic similarity’ based on content… (rather than on metadata)
Use cases – demonstrator client that can be integrated into any repository – which will provide links to papers in other repositories based on similarity relationships – will be open to any institution to use.
(Ed Chamberlain presenting) – COMET (Cambridge Open Metadata) – follow on from the Open Bibliography project– collaborating with CARET at Cambridge, with support from OCLC.
Want to publish lots of Open Bibliographic data – engaging with RDF, enriching records with FAST and VIAF linking – and will document experiences
Be taking MARC21 encoded records from library catalogue – data taken from main University Library catalogue – historical mix of quality and origin built overtime – some mixture of standards as cataloguing practice has changed over time
Data will be made available as MARC21 (bulk download) and also RDF/XML (bulk download and triple-store with SPARQL endpoint etc.) Will publish where possible under PDDL, but will need discussions with OCLC where it includes data from OCLC derived records…
(Presented by David Scruton) Contextual Wrappers is project from FitzWilliam Museum at the University of Cambridge – working with Knowledge Integration Ltd and Culture Grid.
Looking at how ‘Collection Level Descriptions’ interact/add value to ‘item level descriptions’
Will be providing metadata for 160k items and 50 collection level descriptions (think I got that right)
Will be producing draft update for collection level descriptions (and related APIs) in the Culture Grid
(Martin Wynne presenting) Discovering Babel based on Oxford Text archive – looking at c.1400 metadata records and c.1400 electronic literary and linguistic datasets – electronic texts, text corpora, lexicons, audio data, etc.
Also including British National Corpus and an archive of central and East European language resources (TRACTOR)
Metadata will be made available via TEI (Text Encoding Initiative) XML headers; Dublin Core – but will use DC extensions from the Open Language Archives Community (OLAC); CLARIN Metadata Initiative (CMDI); RDF linked data (‘may as well’!)
Will provide an OAI-PMH target, and will be harvested by OLAC, CLARIN, etc.
Example use cases … want to provide data/metadata and allow others to build services on top of this…
Aim to make it easier for end-users to find and access resources; will also produce a ‘How to make your language resources discoverable’ manual.
Key technical challenges – establishing sensible and standards conformant architecture for resource file locations (persistent URLs) …
(presented by Julie Allinson) OpenArt is based on the University of York, working in partnership with the Tate and Acuity Unlimited.
Taking ‘The London Art World 1660-1735’ – an AHRC funded project which has currently produced data in a series of inter-linked spreadsheet – about people, places and sales (or art works) – and also about the art works (where they are currently – expect mainly in Tate Britain) and the source of the information of the sales etc. (bibliographic information)
Can see rich set of inter-relationships – e.g. even between people can have relations like ‘was patron of’ – see all artists with same patronage etc.
Data will be made available via Fedora – will be normalising formats and vocabs. Probably be released in some flavour of RDF. OpenArt will focus on data, but already funding in place to produce ‘end user’ focussed interfaces – which will follow on from OpenArt project.
Example use cases:
Answering research questions lie ‘how many paintings were imported annually into England during this period’
Looking at where art works are now
For institution hope to gain a mechanism for releasing open data
Blog at http://yorkdl.wordpress.com/
‘Open Metadata Pathfinder: optimising and enhancing access to AIM25’ – project based on AIM25 (descriptions of Archives in the M25 area) – 16000 Collection Level descriptions – 2 million hits per month – enquiries from around the world.
AIM25 covers a very wide range of material, of interest to a very wide audience.
Use UKAT (UK Archival Thesaurus) for searching – many users come in via Thesaurus terms rather than keyword search.
Going to take a subset of data – from five institutions that are new members of AIM25 – so starting from clean slate – will be using Linked Data approach – and can therefore test against the existing approach – and compare the approach they have taken to date with Linked Data approach.
SALDA going to release 23,000 records from the Mass Observation Archive at the University of Sussex. Based on transforming EAD (exported from CALM) to RDF – using methodology developed for Archives Hub as part of LOCAH project.
I’m at the ‘startup’ meeting for JISC ‘Resource Discovery Infrastructure’ projects funded under the JISC 15/10 call
Quick descriptions of project follow, starting with …
Project at the University of Lincoln funded by JISC (Paul Stainthorp presenting)
Jerome is a project to ‘liberate data’ – in essence to build a ‘quick and dirty’ index of data that can be made available openly from the University of Lincoln library. Feel confident can release catalogue records as overall volumes small, and have traditionally invested in local cataloguing.
Where full record can’t be released, can still release fact that something belongs to collection, and can then enhance with data from elsewhere that is open.
Then including repository data and archives data, and where possible e-journal holdings from KnowledgeBase (again, use ISSN to draw in descriptive data from open sources due to issues around releasing descriptive metadata from KnowledgeBase).
Technical architecture is a ‘NoSQL’ index – very very fast. First and foremost point of access for the data is an API. Any services created by the University of Lincoln will use this API (‘eat our own dog food’)
Aiming for ‘radical personalisation’ – what do we know about users – their past activity, their location, the weather(!) etc. – use that to deliver relevant services.
Going for ‘incredibly fast’ – “if we can measure it, it is too slow” – literally looking for response times that they can’t measure
Example service – allow users to produce their own library discovery tools.
Presentation available at http://paulstainthorp.com/2011/03/01/jisc-rdtf-meeting-birmingham-jerome/
This blog post is written on behalf of JISC.
Projects to release open metadata about the ollections and resources of HE libraries, museums and archives – details in Appendix E of the call at http://infrastructurecalloct2010.jiscpress.org/appendix-e-infrastructure-for-resource-discovery/ – Andy McGregor (giving this briefing) suggests this is a good place to ask questions via the commenting system, and also may be a way of finding possible partners for bids through the comments. Also see the briefing paper at http://inf11briefingoct2010.jiscpress.org/infrastructure-for-resource-discovery/
Projects in this strand should take into consideration the fact that they are part of a wider vision and should take this into account, and consider how they contribute to this (and that they have the time/resource to do it).
Look very carefully at the strict methdology in place – if bids don’t adhere to this won’t get funded. ‘Linked data’ is encouraged but not compulsory – see http://infrastructurecalloct2010.jiscpress.org/appendix-e-infrastructure-for-resource-discovery/?paragraph=15#15 and http://infrastructurecalloct2010.jiscpress.org/appendix-e-infrastructure-for-resource-discovery/?paragraph=18#18
Funding is focussed on HE institutions – but partnerships with institutions outside HE is welcome.
Project are about establishing practices that can be adopted by other institutions to spread the benefits around the sector – looking for projects that have ways of doing this embedded into them – not just lip-service to concept.
Data and process must be sustainable – looking for more than just a simple declaration in the bid here but clear ideas of how projects will tackle this.