Triples to Trenches – Linked Data in Archives

Lianne Smith from King’s College London Archives

Archives have records/papers from Senior Military personnel (I think I got that right?) [update 10th March 2013: thanks to David Underdown in the comments below for clarifying that Lianne was referring to the Liddell Hart Centre for Military Archives]

See Archives at KCL being innovative – and latest project is in Linked Data space.

Lianne is an archivist, not a technical specialist. 18 months ago, hadn’t heard of linked data – so a beginners view.

Context: Preparation for centenary of the start of the First World War – other institutions also doing work:

KCL had already contributed to the latter of these.

Report had highlighted the problem of locating resources and searching across information sources – even within institutions. Report particularly noted that if content wasn’t surfaced in Google searches it was a lot less ‘discoverable’. Also lack of clear vocabulary for WWI materials – so wanted to establish a standard approach building on existing thesauri etc.

Also Triples to Trenches built on previous project funded by JISC – particularly “Open Metadata Pathway” and “Step Change” projects – creating easy to use tools which would enable archivists to publish linked open data within normal workflows and without specialist knowledge.

Aims of Trenches to Triples included:

  • Creation of API to share data created using Alicat tool (output of Step Change project)
  • Adaption of catalogu front end for visualistation of linked data entities
  • Creation of Linked Data WWI vocabulary of peronal names, corporate name, place and subject terms – available for reuse by archives sector
  • Also act as a case study of Linked Data in archives

Within Alicat Places are based on Google Maps data – which ‘makes it simpler’ [although the problem with using contemporary maps when the world changes…] [also note that looking at a Place ‘record’ on data.aim25.ac.uk there is a GeoNames link – wonder if this is where the data actually comes from? E.g. http://data.aim25.ac.uk/id/place/scapafloworkneyscotland]

Outcomes of project:

  • Creation of WWI dataset and integration into AIM25-UKAT
    • Lesons learned in the creation of the dataset concerning identification of the level of granularity required and the amount of staff time which needs to be invested in preparation
    • different users have very different requirements in terms of granularity of data
    • Team included WWI specialist academic – identified a good resource for battles – could reuse existing data
    • Different users also use variation of terms in their research
  • More work on the front-end (User facing UI) presentation of additional data
    • Being able to integrate things like maps into UI is great
    • Need to work more on what sort of information you want to communicate about entities – especially things like Names – unlike location where a map is obvious addition
  • Need to increase the availability of resources as linked data
  • need to increase understanding and training in the archives sector
    • This approach is hugely reliant on understanding of the data – need archivists involved
  • need ongoing collaboration from the LODLAM community in agreeing standards for Linked Data adoption

 

Discovery API at The National Archives

Aleks Drozdov – enterprise architect for Discovery system at the National Archive (TNA). Going to speak about APIs and Data and how implemented in Discovery system at TNA.

My Introduction to APIs post is relevant to this talk.

API and Data

An API = Application Programming Interface. Web API – in web context the API is typically defined as a set of messages over HTTP. Response messages usually in XML or JSON format.

Data – explosion in amount of data available. Common to ‘mashup’ (combine) data from a number of sources. Also User contributed data.

Discovery Architecture

At the base has a ‘Object Data Store’ – NoSQL object oriented database (MongoDB)

Getting data into Discovery

Vast number of different formats feeding into Discovery:

XML, RDBMS, Text, Spreadsheets etc. Go through a complex/sophisticated data normalisation process. Then fed into MongoDb  – the Object Data Store

Discovery data structure

Discovery treats all things as ‘informational asset’  – you can build hierarchies by links between assets

http://discovery.nationaarchives.gov.uk/SearchUI/details?Uri=C10127419

Last number here is a unique and persistent identifier for an information asset [not clear what level this is

Discovery API examples

Documentation at http://discovery.nationalarchives.gov.uk/SearchUI/api.htm

API endpoint at: http://discovery.nationalarchives.gov.uk/DiscoveryAPI

Just 6 calls supported (see http://discovery.nationalarchives.gov.uk/SearchUI/api.htm)

Can specify xml or json as format for response: http://discovery.nationalarchives.gov.uk/DiscoveryAPI/xml/ or http://discovery.nationalarchives.gov.uk/DiscoveryAPI/json

Search: http://discovery.nationalarchives.gov.uk/DiscoveryAPI/xml/search/{page}/query= or http://discovery.nationalarchives.gov.uk/DiscoveryAPI/json/search/{page}/query=

3o results per page

e.g. http://discovery.nationalarchives.gov.uk/DiscoveryAPI/json/search/1/query=C%20203

See documentation at http://discovery.nationalarchives.gov.uk/SearchUI/api.htm for details of other calls.

Next steps

Now have Discovery Platform and getting people to use API – next plan to build a Data Import API – so that External data can be brought into Discovery platform. Also want to build User Participation API.

Interoperability in Archival descriptions

Jenny Bunn from UCL starting with a summary of history of archival description standards – from USMARC AMC (1977) to ISAD(G) (1st edition formally published 1994).

Meanwhile WGSAD in the US published ‘Standards for Archival Description: A Handbook” – also in 1994. Contains a wide variety of standards relevant to archives – from technical standards to Chicago Manual of Style.

EAD has its origin in encoding the Finding Aid – not to model archive data per se. EAD v1.0 released 1998

Also a mention of ISO23081 – metadata for Records (records management)

Bunn suggests that ISAD(G) is designed for information exchange – not for Archival Description. Specifically ISAD(G) doesn’t discuss the authenticity of records. At this point (says Bunn) ISAD(G) more a straight-jacket than enabler.

Call to action – move to ‘meaning’ vs information exchange in standards.

Point from Jane Stevenson that ISAD(G) not that great for information exchange! But Jenny makes point that as a schema it could serve the purpose – lack of content standard is a barrier to information exchange even within ISAD(G)

 

 

 

 

Google Cultural Institute

James Davies talking about Google Cultural Institute.

As Google grew in size, it increased in scope. Encouraged employees to follow passions. If you get a dozen people in the room you’ll find at least one is passionate about Art – in Google a set of people interested in Art, Galleries, Museums, got together to find a way of making this content available – became the Google Art Project.

At the time James was at the Tate – and got involved in the Google Art project – and was impressed by how Google team listened to expertise in gallery. Now he has moved to Google – talking about various projects – Nelson Mandela Centre of Memory – http://archives.nelsonmandela.org.

Second project from Google in this area – the Cultural Institute – aim to work with variety of organisations including Archives. Finding a way of creating an ‘online exhibit’ – the Nelson Mandela site is an example of this – combines archival material with text from curators/archivists to tell a story. Then can jump into an item in the archive. From the exhibit you can access items – example here a letter from Nelson Mandela to his daughters – v high resolution by the look of it.

Forming a digestible narrative is key to exhibit format.

Romanian archive – includes footage from the revolution – TV broadcast at the time when revolutionaries took over TV studios.

James says archives are about people – plea to use stories assert the value of archives to protect them.

Into Q&A:

Q: Why not use the ‘Archives’ as a metaphor – ‘unboxing’ is the most exiting part of the archives experience and this is lost in ‘exhibit’ format

A: But that is because you know what you are looking for – the

Q: (from me) Risks of doing this in one place – why build a platform rather than distributing tools so archives can do this work themselves.

A: First step – part of that will be about distributing tools and approaches. Syndicating use of platform (as in Nelson Mandela site) is first step in this direction. Future steps could include distributing tools. Emphasised they didn’t want to be ‘hoarders’ of content.

Q: How to make self-navigation of archives easier for novice users

A: Hope that this will come

Several questions around the approach of ‘exhibits’ and ‘narratives’ – feeling that this ignores the depth of archive. Generally answer is that this is a way of presenting content – enables discovery of some content, and gives a place for people to ‘enter’ the archive – and from there explore more deeply.

Lots of concern from floor and on Twitter that this selective approach is at the expense or to the detriment of larger collection.