Putting Warwickshire Libraries on the map

April 22, 2010 · ostephens

I was very pleased to see earlier this week that Warwickshire County Council had started to release data sets openly on the web. The data is released under a Creative Commons Attribution-Share alike license, and I guess builds on the central government data.gov.uk initiative. While at the moment there are only a few datasets, the blog promises more in the very near future. You can see the data that has been released so far at http://opendata.warwickshire.gov.uk/categories

Today I saw an announcement from their @wccopendata twitter feed that they had put an xml feed of Warwickshire libraries on the site. I took a quick look, and saw it included location coordinates I thought I’d do a really quick map mashup. This is entirely based on something I’d seen Tony Hirst do a while back which he blogged at http://arcadiamashups.blogspot.com/2009/11/open-library-training-materials-and.html.

The mashup uses Yahoo Pipes, and makes use of the fact that if you include location information in the right place in the Yahoo Pipes output it will automatically show a map of the results.

The first thing you need to do is get the xml data using the Pipes ‘Fetch Data’ module – this just needs the URL of the xml file:

I’ve also had to put the ‘path to item list’ – in this case if you look at the xml file you can see the structure is something like:

Since the details of each library are within the ‘library’ tag, and I want each library to appear as an individual item in the list, this is what goes in the ‘Path to item list’.

An important aspect of using pipes is that getting the output to display as you want, you have to put the relevant information in specific fields (or give the relevant fields the right names). In this case, I want the library name to appear as the main heading in the output – which means it has to be in the ‘title’ field. In the original XML file, the name of the library is in a <name> tag, so this needs to be renamed – and pipes provide a function that does this:

There are two bits of ‘location’ information in the original feed – the address (including postcode) and some ‘coordinates’. I guess these are OS coordinates, but I haven’t really checked – luckily Pipes is cleverer than me, and has a way of automatically understanding some types of ‘location’ information. In this case I can just push the coordinates through a ‘location builder’:

The place you ‘assign results’ to is important – it is putting the location information in this field that makes the output automatically appear as a map. This was a trick copied directly from Tony’s pipe.

Finally I noticed there was a link to an image for each library in the xml, and thought it would be nice to include this in the output. I knew that this would need to go as an HTML Image tag in a ‘description’ field, so I used a loop and ‘string builder’ function to do this:

The first line and last line puts the image tag stuff in, and the item.image pulls the link from the XML file.

That’s it – the whole pipe:

If you look at the results of the pipe you can see that because all the data is in the right fields, the results automatically appear on a map:

I’m hoping that I might get a chance to play a bit more with this data – perhaps at the upcoming Mashed Library event ‘Liver and Mash’. If anyone is interested in a ‘take away’ exercise, you could try to do the same thing with the Warwickshire Museums and Art Galleries data 🙂

Something kind of OO

April 22, 2010April 22, 2010 · ostephens

My first experience of programming was on a BBC B (for those old enough to remember), using BBC Basic. I didn’t do computing/computer science at school, but I was interested enough to move from typing in program listings from books and magazines to writing some basic programs myself – the one I remember particularly would tell you which day of the week any given date fell on.

Some years later my first ‘real world’ experience of any kind of programming was writing macros in WordPerfect – macros are a way of automating a set of commands or key strokes (often still very useful, I’d really recommend looking at tools like Applescript, AutoHotkey and MacroExpress – they can really help simplify tasks, and sometimes deliver significant time savings). Most Macro languages also support some kind of ‘logic’ allowing you to only carry out parts of the macro when certain conditions are true.

After another gap, my next step on the ladder was using Perl. I initially picked this up because I was working with applications that were written in Perl, and as I started to use it, it felt very familiar – taking me back to my experience on the BBC. I also found the active community around Perl meant that when I hit problems, there was almost certainly help on hand. Dealing with XML especially I found there was a good tool set already available for me to pick up and start using.

By this point all the programming I’d done was procedural. In procedural programming you write a set of ‘procedures’ or ‘routines’ which are (to a large extent) self contained sets of instructions. As you go through a program you can ‘call’ a procedure whenever it is needed – once all the code in the procedure has run, the program picks up from the point you initially called the procedure to run.

It was probably around this time that I thought I ought to really apply myself to learning a programming language properly, and so I picked up a few books on C and C++, and started to read about an alternative to procedural programming – ‘Object Oriented Programming‘ (OOP). Although OOP had been around for a while, it was probably the mid 90s that it started to become widely used.

To be honest, I struggled. I couldn’t get my head around the OOP concept, and at the same time C and C++ were much more difficult to get to grips with than the languages I’d previously used. I didn’t get anywhere, eventually gave up, and stuck with Perl.

Although Perl is often used for procedural programming, it can also be used with an object-oriented (OO) approach and where I was working with code that others had written, I did sometimes use an OO approach – but really without understanding properly what I was doing, and relying on copying examples and practice from others.

As I started to do jobs that were less ‘hands on’, I hardly got time to do any programming, and it wasn’t until last year that I decided I’d find myself some ‘hobby’ projects I could do for fun. Having done a few of these (e.g. Read to Learn and What to Watch) in Perl, I thought it might be time to try something new again. Rather than heading back to C++ or Java, I decided I’d try to take (what I hoped would be) a smallish step – and was left choosing between two languages – Ruby and Python. Both had a reputation for being relatively easy to pick up, and also for enabling you to get stuff done quickly (I really liked this idea).

Having looked at both, and kicked their tyres, I eventually opted for Ruby. I didn’t have very strong feelings about which way to go, but my initial look suggested to me that I’d find Ruby easier – it looked a bit like Perl to me, whereas Python reminded me more of C (not that I’m shallow and go just by looks), a few people recommended it to me, and there was an active community – including people using it for library type stuff (Blacklight is written in Ruby). I hope that at some point I might have a closer look at Python – one thing that did appeal was the fact that the Google App Engine supports Python, making it possible to launch a Python based app without needing to host it on a server somewhere.

The other thing about Ruby is it is often described as ‘completely Object Oriented’ – I was never entirely clear what was meant by this, but as one of my aims was to get to grips with the concept of OOP, it seemed like this was a good place to start.

Having decided to go with Ruby I found a couple of online tutorials (http://tryruby.org/ lets you actually do some Ruby live online straight away, while Ruby in 20 minutes talks you through the basics) and worked my way through them to get the hang of the basics. I also invested in O’Reilly’s “The Ruby Programming Language” on my iPhone – at £2.99 (compared to an RRP for the print edition of £30.99, and currently on Amazon at £18.99!) I think this is really good value, and although I am limited to using it on the iPhone, in this case I’m generally using it like a reference work, and it’s quite nice to use alongside my laptop.

I’ve always found that the only way I really engage with a programming language is to try to use it in reality – tutorials are fine for basic familiarity, but I’m much happier when I’m trying to solve my own problems – and also doing a representative project means I focus on the parts of the language that are really useful to me. So having recently written What to Watch in Perl, I thought a nice easy exercise for me would be to rewrite it in Ruby – it’s only a couple of hundred lines of code but does several tasks I’m likely to do in other places such as retrieve data from web services in XML format and output RSS.

One of the first things I realised was that although Ruby is an OO language for a simple script such as the one I was doing it would be perfectly possibly to take a very procedural approach to programming using Ruby. The question of whether you use an OO or procedural approach is really about how you think about what you are doing, and how you model your data.

Up until this point I’d been familiar with two types of ‘data structure’ – ways of storing data within a program. These were Arrays and Hashes. Arrays are simple lists of things, whereas Hashes are lists of pairs – each pair consisting of a key and a value. The idea of a hash is that you can lookup a value for any given key.

Just for illustration, if you wanted to store a list of ISBNs in a program, you could do this as a an Array which would look pretty much as you’d expect – e.g. (9780671746728, 9780671742515, 9780517226957).

On the otherhand if you wanted to describe a book you might do this using a hash – looking something like:

{ author => Adams, Douglas, title => Dirk Gently’s Holistic Detective Agency, ISBN => 9780671746728 }

You can create more complex structures by mixing an matching these – for example you could have an array of hashes to represent a list of books with detailed metadata, and within this you might even have some of the hash values as arrays – e.g. to represent a list of authors. You can imagine that this can quickly get confusing!

The thing about this approach is that it is easy to evolve these structures as you go along – there is nothing to stop you adding a new ISBN to the list in the Array, or adding a new key/value pair to the hash – if you wanted to record the publisher for example. This is also a problem, as it means you can easily lose track of what you are storing where, or do nonsensical things (e.g. add a ‘director’ key to the hash which is meant to describe books rather than films).

Ruby (and other OO languages) don’t abandon the concepts of arrays and hashes – and Ruby supports both of these. However at the heart of an object-oriented approach is the idea of an ‘object’. The big realisation for me was that an Object both provided a new kind of data structure tied together with various ways of manipulating the data (there is a short paragraph on Wikipedia comparing procedural programming with OOP)

Where I would have previously (for example) used a hash to store the details of a book, I can now define a type of object called a ‘book’ and in that definition I can setup a set of properties that a book has – such as Author, Title, ISBN. This formalises something that would have been much more informal if I’d just used the approach of using a hash to store this information, as described above.

As well as having a data structure, Objects also have ‘methods’ – things that they can do. In practical terms a method is a (generally) self contained piece of code, that does something – not totally unlike a procedure as I described earlier. However ‘methods’ are linked specifically to objects – so you can restrict the types of thing you can do to an object by only defining the relevant methods.

The terminology around this can get a bit confusing – a quick summary:

Class – this is a ‘type of object’ – a general definition which says what properties and methods are linked to an object – so you might define a class of ‘book’
Object – a specific instance of a class – that is, if you had a ‘book’ class, any particular book would be described by an object
Method – an action tied to a class of object

Thinking of sensible examples is always difficult (for me) and I’m not sure the following stands up to closer scrutiny, but I hope it demonstrates the ideas ok. Lets say you have a library with books you can loan, and reference books that you can’t loan. In a procedural language you might achieve this by having a hash that stored the details of a book, perhaps including a ‘reference’ value – so you could store loanable and reference books like this:

{ author => Adams, Douglas, title => Dirk Gently’s Holistic Detective Agency, ISBN => 9780671746728, Reference => no }

{ author => Adams, Douglas, title => Long Dark Teatime of the Soul, ISBN => 9780330309554, Reference => yes }

You could then write a procedure that loaned the book by linking the hash describing the book to a description of a library patron. You’d then have to add in some kind of test to check the value of the ‘Reference’ key in any book hash before you ran the ‘loan’ procedure. If you forgot to run this check at any point, as to all other intents and purposes a loanable book and a reference book are the same, running the ‘loan’ procedure on a reference book would simply result in the reference book being loaned – there would be nothing else to stop this happening.

If we look at an Object Oriented approach to this, instead of having a hash to store the information about each book, we would have ‘objects’ to do this. We could have one type of object (class) for loanable books, and another for reference books. We wouldn’t need to have the extra ‘Reference’ value as in the hash above, because you could easily tell which was a reference book, because it would belong to a different Class of object. Additionally to this, because any ‘methods’ you can use are linked to the type of object (Class), you would simply define the ‘loan’ method (which would do a very similar thing to the ‘loan’ procedure above), which was linked to the ‘loanable book’ class only. You would then literally be unable to loan a ‘reference book’ type object – it would simply result in an error.

So taking an object oriented approach can really help in keeping control of your code, and help in making bugs more obvious and easier to trackdown (or avoid completely). There is an overview of OO thinking in this Ruby user’s guide which I think is useful, and I also found this OOP tutorial really helpful (although the examples are in C++ and Java, rather than Ruby)

As I started to grapple with these issues I quickly realised that using objects formalises what you are doing much more, and makes you think a lot harder about what you are trying to do and how you are going to do it right at the start of a project. It also forces you to think through how you are modelling your data much more carefully. Going back to the previous example you probably don’t want to have two completely separate classes for loanable and reference books – they are both books after all, and will have a lot in common with each other – the only difference being you can loan one and not the other. OOP allows for this by supporting the idea of ‘classes’ and ‘subclasses’ – you can have a general a general class – lets say ‘book’ with the relevant properties and methods attached – but only properties and methods that would apply to any book – so not (in this example) the ‘loan’ method. You can then have two subclasses – the ‘loanable book’ and ‘reference book’ classes – which would ‘inherit’ all the properties and methods from the more general ‘book’ class. You would then add an additional method to the ‘loanable book’ class to enable it to be loaned – and obviously you would not add this to the ‘reference book’ class.

This approach forces you to think through exactly what things you might want to do different classes of object right from the start, and what properties you need any particular object to have. I found it made me think more ‘abstractly’ about the types of data I was dealing with. For example, if you were looking at library data, you might start thinking about books, and define classes as I’ve just described. However, then you realise that you also have DVDs you want to loan out – and that while DVDs share somethings in common with books (they have titles, they can be loaned), they also have a number of different traits (they have directors) s0 you might start to think about modelling your library a more abstract class (e.g. ‘Library Item’) which might setup some basic properties and methods (e.g. they have a title, they can be added to a location), and then have more definite classes for ‘books’, ‘DVDs’ and if there was a new item type added to your stock at a later date (e.g. ‘journals’) you could add a new subclass as appropriate.

This type of modelling is hard work! Even with a relatively simple task I found thinking about this took up a lot of time – and really needed to be done before I could make much of a start on actually coding. This problem of modelling brings me back to my recent post What’s so hard about Linked Data – how data is structured, and how it behaves, is at the heart of this – no matter whether you do this in software or in data schemas.

Finally, where next with my programming journey? I really enjoyed starting to learn Ruby, and I think I’ll try to do some more work with it – I’m especially interested in looking at ‘Rails‘ which is a ‘web application framework‘ – designed to make it easier to develop web applications quickly and easily (and has another new (for me) concept for me to get my head round – MVC – Model-View-Controller)

Discussion panel

April 20, 2010 · ostephens

Panel is:

Peter McDonald (PM)
Marianne Talbot (MT)
David Robertson (DR)
Andy Lane (AL)
Fred Mednick (FM)

Q: Did you have hopes/thoughts about reuse when you did a podcast?

MT: No! But since been approached by a US company interested in doing some lectures for them – so definitely a way of self promotion

DR: Hoped to see use on iTunesU to compare with high quality content from US sites

PM: Public good – I’m publicly funded, and feel it is right to do it

Q: Business models related to Open Content? (e.g. micropayments)

AL: No different to other industries – e.g. Music industry – ‘freely available’ content (whether legal or not) but still need to generate income. May be option for micropayments for ‘value added’ – e.g. provided printed, bound, version of content. At the moment the OU is looking at how much they can afford to spend on the content and how it is classified – is it ‘outreach’, ‘marketing’, ‘recruitment’, ‘teaching’ etc? OER may provide cost savings – not just about income. If OER is an ‘add-on’ or ‘nice to have’ it will fail – has to be a central part of institution.

PM: Personal perspective. Could be part of institutional model to get funding for research etc. – make an OER output a requirement on funding

DR: Would like this type of activity embedded more. e.g. it was discovered that certain ‘reading lists’ were available outside the university Intranet – some academics horrified – DR says he sees his lectures as the property of ‘the world’ (without wanting to be pretentious) – not just for those in Oxford

MT: Perhaps add a request for donations at the end of each podcasts [this makes me think if This American Life and Public radio in the US]

AL: No reason shouldn’t charge for some things – but understanding what people will pay for and who your audience is

Comment from Sarah from Strategic Content Alliance: Have to understand both your audience and your costs – need to get this clear before you think about a revenue model

Q: What is the single greatest challenge for OER?

FM: The plethora of organisations – would be nice if we had interoperable organisations!

AL: Will have succeeded when we stop talking about Open Educational Resources and start talking about Education – OERs are just a means to an end.

PM: How to change thinking. We end up ‘translating’ between the old way and new way of doing things – rather than changing the way we think to deal with the new way of doing things.

MT: Not about challenges – but a worry – will opportunities for new lecturers be curtailed as institutions reuse captured content instead – why have a new lecture when you can re-run an old one?

Q: Publishers don’t know if you use diagrams in lectures – but if you do it on camera you have to clear copyright. Can we get agreement from publishers for non-profit reuse?

AL: Very good point. Early days for publishers as much as it is for HEIs. Some work done – e.g. MIT have agreement with Elsevier that they can include up to 3 diagrams from Elsevier content in a piece of OCW (think I got this right).

Comment from Marion Manton (MM) MOSAIC project (Oxford reuse one, not Library data one): Teachers continually find stuff on the web which they use in their teaching – until OERs are part of this landscape we won’t have success. If you can’t find it via Google, most academics won’t find it – no good locked away in repositories. Are OERs really very different to using books, articles, etc.? Just because it’s a podcast, why should we think about this any differently?

DR: Quality and provenance problems with things on the open web

MM: Yes – but skills to assess quality and provenance of material doesn’t change – and these are skills we need to be fostering anyway

AL: This [OERs, reuse, Open Education] is going to take 10-20 years to shakeout – it isn’t going to happen quickly

FM: Nice story – rewarding attending ‘Learning Ambassador’ course in Nigeria by agreement with driving licensing organisation to issue (for small fee) a personalised licence plate – which made crossing borders etc. easier – there are ways of making this stuff sustainable.

OER and ICT for development

April 20, 2010 · ostephens

Tim Unwin asks – why are OERs not more widely used by people in Sub-Saharan Africa (excluding South Africa), when intuitively they would deliver huge value?

I’m afraid I missed documenting much of this talk. Tim challenged the OER model – it isn’t working (in this geographic area) – why not? Is OER essentially ‘imperialist’? Those involved are generally white, male, and older. Many OERs are not high quality – even flagship efforts like MIT OCW often very basic material available – e.g. just course outlines or basic powerpoint slides.

Biggest challenges:

Changes in personnel
Funding mechanism diversity
Time committments
Failure to understand ‘meanings’ – ICT4D (ICT for development) more than just computers in labs

Practical Realities

Structure and financing of African Universities – and now agendas around new private universities
Traditional didactic model of teaching – counter to particpatroy models
Role and ‘income’ of unversity teachers
Intellectual elitism – are African universities really serving their peoples’ development needs?
Dependant mentalities – ‘where is the next grant coming from?’
Limited human capacity – but some outstanding individuals
Dominance of individualism – idea that HE is about individual benefits and gain, not about community

Implications/Questions for ‘us’ (i.e Europe/US)

Fundamentl challenge of education as a public or private good
How much do we really use OERs in our own work?
Can we afford the time to help African academics achieve their ambitions?

OpenSpires

April 20, 2010April 20, 2010 · ostephens

The OpenSpires project (http://openspires.oucs.ox.ac.uk) at Oxford is about making recordings of talks and lectures available for free in a sustainable way.

Now 280 recordings – approximately 160 hours – with over 130 academics contributing lectures and items. OpenSpires built on the success of using iTunesU (http://itunes.ox.ac.uk) to make podcasts available – over 1630 items, with >3 million downloads – licensed for personal use only – so not OERs in terms of institutional perspective.

Nice quote from a contributor noting with amazement that their lecture on philosophy being downloaded 18,000 times per week – my paraphrasing: “so I knew being ‘number one’ meant more than 20 downloads a week, but I’d no idea beyond that”

They’ve supported a ‘devolved’ model for contributions – departments can provide audio/video recordings to the central service – who can deal with legal stuff etc. Then the central service can ‘gap fill’.

Creative Commons gave a way of licensing material.

Benefits to the institution:

Accessibility
Outreach
Use of technology that reflects what is unique about Oxford
High calibre material of global importance
Fits with institutional strategic mission

Tried to make sure that the amount of extra time needed from academic/lecturer is minimal – shouldn’t be more effort than giving the talk in the first place.

Syndication using RSS – makes it very easy to distribute and enables reuse. (potential) types of reuse:

Website widge
Institutional portal
National portal
VLE/CLE
Subject centres

Communities add value – e.g. translating content into different languages.

Now getting academics to share experience – interesting to note the experience is about individuals appreciating it – fanmail etc. – not other institutions/academics using it? Does this matter?

One academic suggested a change to iTunesU contract – and got it accepted – the part in brackets below:

2.1 The Content. University hereby grants to Apple a nonexclusive, royalty-free right and license to use, reproduce, modify the format and display of Content (not the substance of any Content) …

He says – read contracts before you sign them, and make amendments if necessary! (parallels to the need for academics to look at the rights they sign away to publishers of research)

Q & A and comments from floor:

Q: What about institutional reuse as opposed to individual consumption – and also use of non-commercial for licensing

A: Early days – proved interest, excited to see how others may join together content into ‘courses’. Despite licensing people don’t really seem to have yet realised that the licenses really really mean that you can use this stuff!

Comment: Sustainability will only come as we change our attitudes towards teaching and value it as it should be.

Comment: In medicine a lot of content can’t be published as patients involved and they are happy for material to be used in medical education but not more generally.

Comment: Making available as a podcast allows students to ‘timeshift’ lectures – some worry that this will lead to students not coming to lectures (although commenter not convinced this is a problem)

Giving Knowledge for Free

April 20, 2010 · ostephens

Jan Hylen (previously at the OECD) presenting via video link for this session.

Despite a trend of growing competition where learning resources are often considered as key intellectual property, there is still much sharing of content between academics and institutions. There seems to be a new culture of openness in HE – Open Source Software, Open Access, Open Educational Resources – content made available over the internet for free and licensed for reuse.

OECD/CERI study setup to look at 4 main issues:

IPR issures
How to develop sustainable business models
Incentives and barriers to produce, use and delivery of open resources
How to improve access to and usefulness of resources

Firstly a definition – what is an OER?

OER are digitized material offered freely and openly for educators, students and self-learners to use and re-use for teaching, learning and research (UNESCO 2002)

Four areas of development driving OER:

Technological (improved access, better software)
Social (increased IT skills, expectations of ‘free’)
Economical (lower costs, new business models)
Legal (new licensing – rethinking IP)

Mapping OER movement is challenging – it’s a global movement with a growing number of initiatives and resources. Also remove barriers to access, OER initiatives tend not to require registration – and so poor usage statistics.

Different types of initiatives:

Publicly or institutionally backed programmes – e.g. OpenLearn, OpenSpires, Open Courseward (MIT)
Community approach – Open Course, Common Content, Free Curricula Center
Mixed models – MERLOT, Connexions, ARIADNE

A followup study in 2008 found that the number of resources in 6 major OER initiatives had increased between 30% and 300%; still a large amount in English, but more in other languages; a move from text content to audio-visual and multimedia content (podcasts, video etc.)

A move from the community approach to institutionally supported approach – most initiatives now have institutional support.

According to MIT and Tufts users of OpenCourseWare typically well educated – already holding a degree. Mostly North American based (although this may have changed since) and self-learners (i.e. not use in other institutions)

Teachers asked said they tended to use OERs are a supplement to other materials – generally as smaller chunks. Barriers to using OERs were lack of time, skills and reward systems.

Motivations for producing and sharing OERs:

Governments

Expaned access to learning
Bridge gap between informal and formal learning
Promote lifelong learning

Instituitons

Altruism
Leverage on taxpayers money
“What you give you recieve back improved”
Good PR and shop window
Growing competition – new cost recovery models needed
Stimulat internal improvement, innovation and reuse

Individuals

Altruistic or community supportive reasons
Personal non-monetary gain – ego-boost
Commercial reasons
It is not worth the effort to keep the resource closed

OECD report “Giving Knowledge for Free”

During Q & A Andy Lane makes the point that you get waves of interest in specific areas – e.g. Darwin bicentenary – but this interest drops off quickly.

Content, Collaboration and Innovation

April 20, 2010 · ostephens

Today I’m at the Beyond Borders event in Oxford, in the very nicely equipped Said Business School. After a welcome from Melissa Highton, first up is Andy Lane talking about ‘OpenLearn‘ at the Open University.

Andy first asks ‘why make educational resources open’? There was a growing momentum behind OER worldwide (led by MIT) and the emergence of creative commons licenses made it possible to clearly state how materials could be used/reused. The idea of Open Educational Resources fitted well with the OU’s committment to social justice and widening participation – as well as the opportunity to build markets and reputation.

It was hoped that OERs might bridge the divide between formal and informal learning. It costs a lot to create good content – so any opportunity to reuse content and allow more time to be spent in areas where more value could be added – e.g. personal support.

Openlearn is in the process of moving to more ‘short form’ content – bringing in content previously hosted on open2.net. This short form content might be delivered via a number of routes – YouTube, iTunes, etc. At the same time there will be long form content for both learners (in the ‘Learning Space‘) and for educators (‘LabSpace‘). This will be complimented by OLNet – focused on Researchers.

LearningSpace (long form content) is delivered using the Moodle VLE. Not just a way of delivering open resources, but also somewhere that some experimentation can take place in terms of content format, content creation tools, delivery methods etc – some of which will feedback into the OU’s core VLE product.

OU believes this approach helps bridge informal and formal learning – the learner comes first, content is the hook, and delivers flexibility with a mix and match approach and self pacing. Only about 126,000 people registered – many fewer than the number of people who are browsing the site.

It is a huge challenge to understand how people are using the material. Example of Daniel Conn from the Times. On Open Learn seeing both ‘volunteer students’ and ‘social learners’.

Andy now talking about LabSpace – examples of teachers collaborating on aspects of creating educational resources – e.g.:

Preparation
Curriculum extension
Professional development
Share materials
…

Example of pushing learning content into a WordPress Blog (example of course on Hume – more information on how this was done at http://jimgroom.umwblogs.org/2008/02/17/proud-spammer-of-open-university-courses/, and thoughts from Tony Hirst at http://ouseful.open.ac.uk/blogarchive/013251.html)

Q & A:

Q: What kind of pressure is there to show link between publishing OERs and showing it brings in students to the Open University. What evidence is there?

A: Yes – those questions have been asked. It was an institutional action research project with buy-in from the top and external funding. Benefits not just in terms of how many students come in through this process – but many other aspects – use in Widening participation strategy – a way of dealing with hard to reach groups and bringing them in; being used by marketing department; being used as part of student registration process; used to work with regional funding bodies (in Scotland and Wales). Andy stresses all aspects need to be considered when looking at benefits

What’s so hard about Linked Data?

April 12, 2010 · ostephens

My post on Linked Data from a couple of weeks ago got some good comments and definitely helped me in exploring my own understanding of the area. The 4 Principles of Linked Data as laid out by Tim Berners-Lee seem relatively straightforward, and although there are some things that you need to get your head around (some terminology, some concepts) the basic principles don’t seem that difficult.

So what is difficult about Linked Data (and what is not)?

Data Modelling

Data Modelling is “a method used to define and analyze data requirements needed to support the business processes of an organization“. The problem is that the real world is messy, and describing it in a way that can be manipulated by computers is always problematic.

Basically data modelling is difficult. It is probably true of any sector, but anyone working in libraries who has looked at how we represent bibliographic and related data, and library processes, in our systems will know it gets complicated extremely quickly. With library data you can easily get bogged down in philosophical questions (what is a book?, how do you represent an ‘idea’?).

This is not a problem unique to Linked Data – modelling is hard however you approach it, but my suspicion is that using a Linked Data approach brings these questions to the fore. I’m not entirely sure about this, but my guess is that if you store your data in a relational database, the model is much more in the software that you build on top of this than in the database structure. With Linked Data I think there is a tendency to try to build better models in the inherent data structure (because you can?), leaving less of the modelling decisions to the software implementation.

If I’m right about this, it means Linked Data forces you to think more carefully about the data model at a much earlier point in the process of designing and developing systems. It also means that anyone interacting with your Linked Data (consumers) need to not just understand your data, but also your model – which can be challenging.

I’d recommend having at a look at various presentations/articles/posts by those involved in implementing Linked Data for parts of the BBC website e.g this presentation on How the BBC make Websites from IWMW2009.

Also to see (or contribute to) the thought processes behind building a Linked Data model, have a look at this work in progress on modelling Science Museum data/collections by Mia Ridge.

Reuse

One of the concepts with Linked Data is that you don’t invent new identifiers, models and vocabularies if someone else has already done it. This sounds great, and is one of the promises that open Linked Data brings – if the BBC have already established an ‘identifier’ for the common Kingfisher species, then I shouldn’t need to do this again. Similarly if someone else has already established a Linked Data vocabulary for describing people, and I want to describe a person, I can simply use this existing vocabulary. More than this I can mix and match existing elements in new models – so if I want to describe books about wildlife, and their authors, I can use the BBC wildlife identifiers when I want to show a book is about a particular species, and I can use the FOAF vocabulary (linked above) to described the authors.

This all sounds great – and given that I’ve said modelling data is difficult, the idea that someone else may have done the hard work for you and you can just pick up their model sounds great. Unfortunately I think that reuse is actually much more difficult than it sounds.

First you’ve got to find the existing identifier/vocabulary, then you’ve got to decide if it does what you need it to do, and you may have to make some judgements about the provenance and longterm prospects for those things you are going to reuse. If you use the BBC URI for Kingfishers, are you sure they are talking about the same thing you are? If so, how much do you trust that URI to be there in a year? In 5 years? In 10 years? (my books are highly likely to be around for 10 years).

I would guess reuse will get easier as Linked Data becomes more established (assuming it does). The recently established Schemapedia looks like a good starting point for discovering existing vocabularies you may be able to reuse, while Sameas.org is a good place to find existing Linked Data identifiers. This is also an area where communities of practice are going to be very important. For libraries it isn’t too hard to imagine a collaborative effort to establish common Linked Data identifiers for authors (VIAF as Linked Data looks like it could well be a viable starting point for this)

RDF and SPARQL

In my previous post I question the mention of RDF and SPARQL as part of the Linked Data principles. However, I don’t particularly have an issue with RDF and SPARQL as such – however my perception is that others do. Recently Mike Ellis laid dow a challenge to the Linked Data community in which he says “How I should do this [publish linked data], and easily. If you need to use the word “ontology” or “triple” or make me understand the deepest horrors of RDF, consider your approach a failed approach”, which suggests that RDF is difficult, or at the least, complicated.

I’m not going to defend RDF as uncomplicated, but I do think it is an area of Linked Data that attracts some bad press, which is probably unwarranted. My argument is that RDF isn’t the difficult bit – it’s the data modelling that gets represented in RDF that is difficult. This is echoed by the comment in the Nodalities article from Tom Scott and Michael Smethurst from the BBC

The trick here isn’t the RDF mapping – it’s having a well thought through and well expressed domain model. And if you’re serious about building web sites that’s something you need anyway. Using this ontology we began to add RDF views to /programmes (e.g. www.bbc.co.uk/programmes/b00f91wz.rdf). Again the work needed was minimal.

So for those considering the Linked Data approach we’d say that 95% of the work is work you should be doing just to build for the (non-semantic) web. Get the fundamentals right and the leap to the Semantic Web is really more of a hop.

I do think that we are still lacking any close to decent consumer facing tools for interacting with RDF (although I’d be really happy to be shown some good examples). When I played around with an RDF representation of characters from Middlemarch I authored the RDF by hand, having failed to find an authoring tool I could use. I found a few more tools that were OK to use for visualising and exploring the data I created – but to be honest most of these seemed buggy or flaky in some way.

I have to admit that I haven’t got my head around SPARQL in any meaningful way yet, and I’m not convinced it deserves the prominence it seems to be currently getting in the Linked Data world. SPARQL is a language for querying RDF, and as such is clearly going to be an essential tool for those using and manipulating RDF. However, you could say the same about SQL (a language for querying data stored as tables with rows and columns) in relation to traditional databases – but most people neither know, nor case, about SQL.

Tony Hirst often mentions how widespread the use of spreadsheets to store tabular data is, and that this enables basic data manipulation to happen on the desktop. Many people are comfortable with representing sets of data as tables – and I suspect this embedded strongly in our culture. It may be we will see tools that start to bridge this divide – I was very very impressed by the demonstration videos of the Gridworks tool posted on the Freebase blog recently, and I’m really looking forward to playing with it when it is made publicly available.

Conclusion

I’m not sure I have a strong conclusion – sorry! What I am aware of is a shift in my thinking. I used to think the technical aspects of Linked Data were the hard bits – RDF, SPARQL, and a whole load of stuff I haven’t mentioned. While there is no doubt that these things are complicated, and complex, I now believe the really difficult bits are the modelling and reuse aspects. I also think that there is an overlap here with the areas where domain experts need to have an understanding of ‘computing’ concepts, and computing experts need to understand the domain – and this kind of crossover is always difficult.

Overdue Ideas

Ideas linking Libraries, Computing, E-learning, and anything else that springs to mind.

Monthly Archives ⇒ April 2010

Putting Warwickshire Libraries on the map

Something kind of OO

Discussion panel

OER and ICT for development

OpenSpires

Giving Knowledge for Free

Content, Collaboration and Innovation

What’s so hard about Linked Data?

Data Modelling

Reuse

RDF and SPARQL

Conclusion