I’ve been involved in discussions around the licensing of library/museum/archive metadata over the last couple of years, specifically through my work on UK Discovery (http://discovery.ac.uk) – an initiative to enable resource discovery through the publication and aggregation of metadata according to simple, open, principles.
In the course of this work I’ve co-authored the JISC Guide to Open Bibliographic Data and become a signatory of the Discovery Open Metadata Principles. Recently I’ve been discussing some of the issues around licensing with Ed Chamberlain and others (see Ed’s thoughts on licensing on the CUL-COMET blog), and over coffee this morning I was trying to untangle the issues and reasons for specific approaches to licensing – for some reason they formed in my head as a set of Q&A so I’ve jotted them down in this form… at the moment this is really to help me with my thinking but I thought I’d share in case.
N.B. These are just some thoughts – not comprehensive and not official views from the Discovery intiative
Q1: Why apply an explicit license to your metadata?
A1.1: To enable appropriate re-use
Q2: What prevents appropriate re-use
A2.1: Uncertainty/lack of clarity about what can be done with the data
A2.2: Any barriers that add an overhead – could be technical or legal
Q3: What sort of barriers add overhead?
A3.1: Attribution licensing – where data from many sources are being mixed together this overhead can be considerable.
A3.2: Machine readable licensing data to be provided with data – adds complexity to data processing, potentially increases network traffic and slows applications
A3.3: Licensing requiring human intervention to determine rights for reuse at a data level – this type of activity effectively stops applications being built on the data as it isn’t possible for software to decide if a piece of data can be included or not (NB human intervention for whole datasets is less of an issue – building an app on a dataset where all data is covered by the same license which has been interpreted by a human in advance of writing software is not an issue)
A3.4: Licensing which is not clear about what type of reuse is allowed. The NC (Non-commercial) licenses exemplify this as the definition of what amounts to ‘commerical use’ is often unclear.
A3.5: Licensing not generally familiar to the potential consumers of the data (for re-use purposes) – e.g. writing a new license specific to your requirements rather than adopting a Creative Commons or other more widely used licence.
Q4: What does this suggest in terms of data licensing decisions?
A4.1: Putting data in public domain removes all doubt – it can be reused freely – a consumer doesn’t have to check anything etc.
A4.2: Putting data in public domain removes the overhead of attribution – where data from many sources are being mixed together this overhead can be considerable
A4.3: Where there is licensing beyond public domain, reuse will be encouraged if it is easy to establish (preferably in an automated way) what licensing is associated with any particular data
A4.4: Where data within a single set is available under different licensing, reuse will be encouraged by making it easy to address only data with a specified license attached. E.g. ‘only give me data that is CC0 or ODC-PDDL’