{"id":1164,"date":"2011-03-30T11:04:05","date_gmt":"2011-03-30T10:04:05","guid":{"rendered":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/?p=1164"},"modified":"2011-03-31T12:10:33","modified_gmt":"2011-03-31T11:10:33","slug":"provenance-and-linked-open-data","status":"publish","type":"post","link":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2011\/03\/provenance-and-linked-open-data\/","title":{"rendered":"Provenance and Linked Open Data"},"content":{"rendered":"<p>Today and tomorrow I&#8217;m at a workshop on <a href=\"http:\/\/wiki.esi.ac.uk\/Workshop:_Understanding_Provenance_and_Linked_Open_Data\">Provenance and Linked Open Data in Edinburgh<\/a>. The workshop is linked to the work of the <a href=\"http:\/\/www.w3.org\/2005\/Incubator\/prov\/wiki\/W3C_Provenance_Incubator_Group_Wiki\">W3C Provenance Incubator Group<\/a>.<\/p>\n<p>First up\u00a0Paul Groth (<a href=\"http:\/\/twitter.com\/pgroth\">@pgroth<\/a>) from VU University Amsterdam is going to summarise the work of the incubator group and outline remaining open questions.<\/p>\n<p>Paul says this audience (at this workshop) takes as a given the need for provenance. Provenance is fundamental to the web &#8211; it is a pressing issue in many areas for W3C:<\/p>\n<ul>\n<li>Linked data\/Semantic web<\/li>\n<li>Open government (data.gov, data.gov.uk)<\/li>\n<li>HCLS (?)<\/li>\n<\/ul>\n<p>Most people do not know how to approach provenance &#8211; people looking for standard and methodology that they can use immediately. Existing research\/work on provenance scattered across computer and library science research &#8211; hard to get overview. Also within enterprise\/business systems often a concept of provenance, but without using the same terminology.<\/p>\n<p>The provenance group was tasked to &#8216;provide state-of-the-art understanding and develop and roadmap&#8217;. About 20 active members, worked over about a year and came to:<\/p>\n<ul>\n<li>Common (working) definition of provenance<\/li>\n<\/ul>\n<p>&#8220;Provenance of a resource is a record that described entities and processes involved in producing and delivering or otherwise influencing that resource&#8221;<\/p>\n<p>Provenance is metadata (but not all metadata is provenance). Provenance provides a &#8216;substrate for deriving different trust metrics&#8217; (but it isn&#8217;t trust)<\/p>\n<p>Provenance records can be used to verify and authenticate among other uses &#8211; but you can have provenance without cryptography\/security<\/p>\n<p>Provenance assertions can have their own provenance!<\/p>\n<p>Inference is useful if provenance records are incomplete. There may be different accounts of provenance for the same data.<\/p>\n<ul>\n<li>Developed set of key dimensions for provenance<\/li>\n<\/ul>\n<p>3 top level dimensions:<\/p>\n<p><strong>Content<\/strong> &#8211; ability to identify things; describe processes; describe who made a statement; to know how a database solve a specific query<\/p>\n<p><strong>Management<\/strong> &#8211; How should provenance be &#8216;exposed&#8217;; How do we deal with the scale of provenance; How do we deal with scale?<\/p>\n<p><strong>Use<\/strong> &#8211; How do we go about using provenance information &#8211; showing trust, ownership, uncovering errors, interoperability&#8230;<\/p>\n<p>Each of these dimensions broken down into further sub-categories.<\/p>\n<ul>\n<li>Collected use cases<\/li>\n<\/ul>\n<p>Over 30 use cases &#8211; from many domains (at least two from Cultural Heritage &#8216;<a href=\"http:\/\/www.w3.org\/2005\/Incubator\/prov\/wiki\/Use_Case_Collection_vs_Objects_Cultural_Heritage\">Collection vs Objects in Cultural Heritage<\/a>&#8216;, &#8216;<a href=\"http:\/\/www.w3.org\/2005\/Incubator\/prov\/wiki\/Use_Case_Different_Levels_Cultural_Heritage\">Different_Levels_Cultural_Heritage<\/a>&#8216;).<\/p>\n<ul>\n<li>Designed 3 flagship scenarios from the use cases<\/li>\n<\/ul>\n<p>The 30+ use cases were boiled down into three &#8216;super use-cases&#8217; &#8211; trying to cover everything:<\/p>\n<ol>\n<li><a href=\"http:\/\/www.w3.org\/2005\/Incubator\/prov\/wiki\/User_Requirements#News_Aggregator_Scenario\">News aggregator<\/a><\/li>\n<li><a href=\"http:\/\/www.w3.org\/2005\/Incubator\/prov\/wiki\/User_Requirements#Disease_Outbreak_Scenario\">Disease Outbreak<\/a><\/li>\n<li><a href=\"http:\/\/www.w3.org\/2005\/Incubator\/prov\/wiki\/User_Requirements#Business_Contract_Scenario\">Business Contracts<\/a><\/li>\n<\/ol>\n<ul>\n<li>Created mappings for existing vocabularies for provenance<\/li>\n<\/ul>\n<ul>\n<li>&#8230; more<\/li>\n<\/ul>\n<p>Group came up with recommendations:<\/p>\n<ul>\n<li>Proposed a Provenance Interchange Working Group &#8211; to define a provenance exchange language &#8211; to enable systems to exchange provenance information, and to make it possible to publish this on the web<\/li>\n<\/ul>\n<p>Timeline:<\/p>\n<p>W3C in the process of deciding whether the Provenance Interchange Working Group should be approved. If this goes ahead will start soon. Two year working group &#8211; aggressive deliverable target. &#8220;Standards work is hard&#8221; says Paul. Will rely on next version of RDF (not time to cover this now).<\/p>\n<p>Open Questions:<\/p>\n<ul>\n<li>How to deal with Complex Objects\n<ul>\n<li>dealing with multiple levels of granularity<\/li>\n<li>how provenance interacts with Named Graphs<\/li>\n<li>Unification of database provenance and process &#8216;style&#8217; provenance<\/li>\n<li>objects, their versions and their provenance<\/li>\n<li>visualisation and summarization<\/li>\n<\/ul>\n<\/li>\n<li>Imperfections\n<ul>\n<li>What is adequate provenance for proof\/quality?<\/li>\n<li>How do we deal with gaps in provenance?<\/li>\n<li>Repeatability vs. reproduction and how much provenance is enough?<\/li>\n<li>Can provenance help us get around the problem of reasoning over integrated data?<\/li>\n<li>Using provenance as a platform for trust, does it work?<\/li>\n<\/ul>\n<\/li>\n<li>Distribution\n<ul>\n<li>How do we encourage provenance capture?<\/li>\n<li>Multiple disagreeing claims about the origins data &#8211; which one is right?<\/li>\n<li>SameAs detection through provenance<\/li>\n<li>Distribution often gives us privacy &#8211; once we integrate how do we preserve privacy<\/li>\n<li>Scale (way more provenance than data! Has to scale &#8211; to very large)<\/li>\n<li>Hypothesis: distribution is a fundamental property of provenance<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Today and tomorrow I&#8217;m at a workshop on Provenance and Linked Open Data in Edinburgh. The workshop is linked to the work of the W3C Provenance Incubator Group. First up\u00a0Paul Groth (@pgroth) from VU University Amsterdam is going to summarise the work of the incubator group and outline remaining open questions. Paul says this audience [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[64],"class_list":["post-1164","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-provenance"],"_links":{"self":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1164","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/comments?post=1164"}],"version-history":[{"count":2,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1164\/revisions"}],"predecessor-version":[{"id":1194,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1164\/revisions\/1194"}],"wp:attachment":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/media?parent=1164"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/categories?post=1164"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/tags?post=1164"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}