Cloud computing seems to be the buzzword of the moment, and there is currently a lot of media coverage – especially in the light of the recent Microsoft announcement of their own take on the cloud with Azure. I’ve also been following some less high profile, but nonetheless thought provoking, discussions about other aspects of the cloud such as the ‘data cloud’ on blogs and twitter.
Why cloud? At some point it became usual to represent the network as a ‘cloud’ in network and computing architecture diagrams like this:
The cloud represents the complexity of the internet here, but also says it is quite simple from the network point of view – stuff goes in one end, and comes out the other.
The concept of ‘cloud computing’ is that you can have services that sit on the Internet (i.e. ‘in the cloud’) and use them in the same way – push stuff in, get stuff out, don’t really care how it happens.
Examples of ‘cloud computing’ that are often cited are:
- Google Docs
- Amazon S3
- Amazon EC2
The first is a suite of office tools that exist only online – you don’t download anything to your PC, and you interact with them via your browser.
The second two are services from Amazon – the first is a storage service, where you can use hard disk space on Amazon servers to store stuff, and the second is service which allows you to run virtual servers, on demand, on Amazon hardware.
This week Microsoft announced it’s own take on the cloud in the form of Azure – which will provide a way of synchronising between your desktop, your mobile and ‘the cloud’.
In a recent post to the ZDNet Semantic Web blog, Paul Miller goes on to talk about the possibility of a ‘data cloud’ – linking it to the idea behind the ‘semantic web’ – that is creating a web of data, all inter-linked.
Paul Walk argued (and I’m inclined to agree), that you couldn’t talk about data in the same way as computing power – as you cared about data in a different way to your computing power – you would never accept just ‘any old data’. In a comment on this post Chris Rusbridge suggests that what Paul is referring to is the provenance of the data – again I’m inclined to agree.
However, I’d say that actually this is as true of obtaining computing power as it is data. Although, as Paul Walk notes, I may not care about the particular hardware that I’m utilising, or where it lives, I do care that I’m being offered a robust service – and so I don’t want just any old computing power – I want the good stuff! I personally use the Amazon S3 service to backup data – because I trust that Amazon is going to be pretty reliable – I wouldn’t trust the same data to some bloke running a ‘cloud computing’ service from his garage.
The difference for me between the Internet as a ‘cloud’ and the idea of ‘cloud computing’ is that when I transmit data over the Internet as a network I’m trusting not in a single provider, but essentially in a technical protocol and infrastructure to get stuff from one place to another – and although one part of that journey is governed by someone I’ve chosen (my ISP) most of it isn’t. When I choose a ‘cloud computing’ service I trust a ‘brand’ that provides the service – admittedly I don’t ask questions about how they provide the service (do they subcontract? how would I know?), but I not just throwing a task at a generalised technical solution and saying ‘store this’ or ‘process that’.
I would argue that peer-to-peer networks are much closer to the idea of ‘cloud’ computing than Amazon’s or Google’s services. If I upload something to a peer-to-peer network, then it is potentially going to be stored in lots of places, and I won’t know where it is. For some data this might work (stuff that I really want to share), but for others (stuff I want to keep but perhaps not share) it isn’t.
Skype also uses peer-to-peer technology to route Skype calls – and again, I would argue that this is much closer to a situation where you really “don’t care” where the processing takes place – as long as your call holds up.
So, I think that what is being called ‘cloud computing’ is actually SaaS – Software (or Storage I guess) as a Service. SaaS is a model where you obtain access to software that is hosted elsewhere – so typically via the Internet. When I use Google Docs or Amazon S3 this is really what I’m doing.
Several Library system vendors offer – although without incredibly enthusiastic uptake in the academic library sector (see the post from Dave Pattern and various comments at http://www.daveyp.com/blog/archives/303). I have in the past been a bit sceptical about the idea of SaaS, but as I note in my comment on the post above, I’m now much more convinced.
I think that Paul Miller’s arguments make more sense in this context – DaaS (Data as a Service) – that is getting your data that is hosted somewhere else makes sense. However, Paul is arguing for something a bit more than this – data that is hosted in a way that makes
in it accessible and linkable – and this is something that I think libraries need to get to grips with – there is a lot of talk of ‘data silos’ and how libraries are guilty of perpetuating this – we need to break out of this paradigm. I was very depressed to see a comment on an email list this week that said ""There is something to be said for the library's catalogue being self contained and inhouse" – I think this is an attitude we have to change – although I understand the arguments about reliability (e.g. in the face of network failure) we can overcome these problems without having systems that are ‘self contained’ and if we are to have library data as part of the cloud, we need to.