JISCExpo: Notes from the Linked data Workshop at Stanford

This session being presented by Jerry Persons.

This workshop spent a week looking at Linked Data – ‘be part of the web, not just on it’. Workshop sponsored by a variety of research and national libraries, research groups, companies, etc.

The workshop focused on “crafting fund-able plans for creating tools, processes and vehicles to expedite a disruptive paradigm shift in the workflows, data stores and interfaces used for managing, discovery and navigating…”

The workshop was deliberately ‘library focussed’ but recognise much wider issues – especially synergy for GLAM (galleries, libraries, archives, museums)

“I’ve liked to characterize the current moment as a circle of libraries, museums, archives, universities, journalists, publishers, broadcasters and a number of others in the culture industries standing around, eyeing one another and at the space between them while wondering how they need to reconfigure for a world of digitally networked knowledge” – Josh Greenberg

“The biggest problem we face right now is a way to ‘link’ information that comes from different sources that can scale to hundereds of millions of statements” – Stefano Mazzocchi

22 issues were identified by the mid-point of the workshop – just a few here:
co-referencings, reconciliation
use of extant metadata
killer apps
user seduction and training
workflow
scalability
licensing

Jerry says … “The elevator pitch for linked data does not yet exist”

Thinking about ‘novice’ (apprentice), ‘journeyman’, ‘master’ stages of engaging with Linked Data:

  • Value statement use cases
  • Publishing data
  • etc.

At each stage we should be looking model implementations that people can look at/follow

Elephants in the room:
URIs not strings – don’t underestimate the amount of effort required to transform large subsets of GLAM metadata into linked data with URIs as identifiers

Caveats…
Management of co-refereces needs to be a bottom up process
Build systems that accept the way the world is, not what you would like to be
Focus on changing current practices (in the long run) not only on reconciling data (in the short run) – preventing problems better than solving them!

Some docs will be coming out from the workshop very soon as well as proposals for work – over next few months

JISCExpo: Community and Linked Data

I’m at the #jiscexpo programme meeting today and tomorrow…

Ben O’Steen is the first formal talk of the day … talking about ‘community’…

Ben notes that SPARQL has a very bad reputation – people don’t like it and don’t want to use it. Taking a step back – SQL is standard way of interacting with databases, but in general you don’t write SQL queries against someone else’s database – and v unusual to do this without permission and documentation etc. (I guess unless you are really hacking into it!)

In general SQL databases are hidden from ‘remote’ users via APIs or other interfaces which present you with views on the data, not the raw data structure.

So what does this tell us about what we need to do with Linked Data?

Interaction Feedback Loop – fundamental – if you can get this you get engagement. Example ‘mouse presses button, mouse gets cheese’ – this encourages a behaviour in the mouse. Ben uses World of WarCraft as example of interaction feedback loop that works incredibly well – people write their own programmes and interfaces for WoW.

Ben notes this is not about Gamification… this is about getting pay-off for interaction.

Ben sets some homework – go read http://jboriss.wordpress.com/2011/07/06/user-testing-in-the-wild-joes-first-computer-encounter/ – blog post about user testing on web browsers and the experience of ‘Joe’ a 60 year-old who has never used a computer before – and what happened when he tried to find a local restaurant to eat in via three major web browsers “There is little modern applications do to guide people who have never used a computer”.

Sliding scale of interaction

  • googling and finding a website;
  • hunting and clicking round the website for information;
  • using a well-documented or cookie-cutter API (such as an Atom feed or a search interface);
  • Using boolean searching or other simple API ‘tricks’ –
    • WITHOUT requiring understanding of the true data model

Ben now going back to SPARQL – it is common when interacting with an unknown SPARQL endpoint to become frustrated….

What do you need to understand to craft successful SPARQL?
Understand

  • RDF and triple/quad model
  • RDF types and namespaces
  • structures in an endpoint
  • SPARQL syntaxes
  • SPARQL return formats
  • libraries for RDF responses
  • libraries for XML responses
  • … and more

Developers are clamouring for APIs

  • Every new social/web service is seen to be lacking if it is missing an API due to desire to build mobile applications
  • Whilst SPARQL can be seen as the ultimate API, then the ultimate Twitter API would be access using its Scala/Java libraries
  • Many need to see the benefits of something simple in order to hook them into learning something more complex

Taking an ‘opinionated view’ on information helps adopters – offering a constrained view of the model. Could offer csv/json/html views on the data behind a SPARQL endpoint. Ben notes ‘access to the full model is a wonderful thing’ – but don’t forget (paraphrase) ‘most average developers want constrained view’
Ben now talking about schema.org – new intiative from Google, Bing and Yahoo! Ben notes – schema.org delivers ‘cheese’ immediately – clear that the reason you want to do this is to improve search engine results.

Ben notes – schema.org contains very ‘opinionated’ views of the things it can describe – but this gives simplicity and lowers barriers to adoption.

Schema.org going to increase the amount of structured data on the web –

In summary:

  • Be empathetic to those who don’t understand what you are doing
  • Need to provide gamut of views on your data
  • You don’t have to use a triplestore to use RDF
  • Raw dumps of data are often far better than dumps of structured data such as RDF if that structure is not documented
  • “Semantic Web” has garnered such a bad PR that ‘we’ (?) are on the back foot – things and attitudes need to change or it will be forgotten in favour of schema.org