SORT – Linked Data: avoiding a ‘Break of Gauge’ in your web content

Tom Heath from Talis… slides at  http://tomheath.com/slides/2010-06-manchester-inked-data-avoiding-breaks-of-gauge-in-your-web-content.pdf

Tom using his journey to work (Bristol -> Birmingham) as analogy …

  • 25 minute walk to the station
  • Train Bristol Temple Meads -> Birmingham New Street
  • Train to Birmingham New Street -> Birmingham International
  • Bike from Birmingham International to Talis offices

The Rail network allows this to happen. However, if we look back to 1800s – same journey would have taken around 4 days before rail link was built (Birmingham to Gloucester railway – see http://en.wikipedia.org/wiki/Birmingham_and_Gloucester_Railway). However, even when rail link was built, had to change at Gloucester to get train to Bristol because the track Bristol->Gloucester was a different gauge to Gloucester->Birmingham.

The situation for data in HE is currently like the picture before the national rail network was developed – lots of isolated nodes of data. While possible to do custom links between various datasets – it is difficult to answer questions that might require links across many datasets.

At the moment we might be able to mashup data from several sources – but it costs us each time we do it (as with changing trains at Gloucester). Linked Data makes it possible to combine the data without this ‘each time’ cost.

Tom’s ‘take home’ messages:

  • Building physical networks adds value to the places which are connected – the Birmingham<->Bristol railway was built for a reason not arbitrarily – allowed transport of goods from a port to inland city
  • Buidling virtual networks adds value to the tings whih are connected
  • Linked Data enables us to build a network or ‘Web’ of data sets

How?

  • No need for a ‘Big Bang’ – exploit existing infrastructure; build a backbone
  • Costs? – As for any infrastructure investment; Bootstrapping cost vs cost savings and value of things that wouldn’t otherwise get done

Q & A

Q: (David Flanders) Are there some examples that people can look at for guidance

A: Biggest example – data.gov.uk – example of infrastructure that allows devolved ownership of URIs – which separates out the URI namespace from Department names etc. – lots of really good practice. See also Jeni Tennison’s blog – http://www.jenitennison.com/blog/. If the UK Government can do it – any University can do the same.

Q: (Mike Ellis) Conceptually great, but reality it is too hard – better to do what you can?

A: Don’t agree 🙂 Anyone can get the idea of a network of things connected – just draw a spider diagram and you’ve got the idea. The technical challenge is new – but there are always technical challenges – we all need to learn new things to deal with this – but whatever happens next this will be true

Q: (Peter Burnhill) Machine readable is key. In the past we got hung up on ‘channels’ as opposed to data models. Need to move to the place where publication of schemas is a great thing to do.

A: Agree. Any institution publishing ontologies or vocabularies that is then reused – gets ‘credit’ by reuse of their URIs …

Q: (David Kay) I’m with Mike – this is closer to solving the ‘authority file’ problem rather than data model problem. If we’ve continually failed to solve this problem aren’t we bound to fail with this attempt as well?

A: Need to stop thinking of a ‘authority’ answer – may have lots answers – and may be contradictory. But this is what will allow you to scale – you will use the one that is most useful to you.

Q: (Liz Lyon) Just to mention ‘Concept Web Alliance’ in Bioinformatics is looking at describing concepts using RDF http://conceptweblog.wordpress.com/

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.