SORT – A toe in the cloud: working at web scale with Flickr

The afternoon at Survive or Thrive starts with a presentation from Jo Pugh from the National Archives talking about their experience of using Flickr.

National Archives – archive of the British Government – contains everything from 11th, 12th century manuscripts, to the web archive of the last UK government.

Jo made 8 arguments for putting pictures on flickr:

  1. Be a ‘shop window’ bringing content to new audience
  2. User tagging will creat folksonomies – which will make things easier to find
  3. Content will proliferate acrsoss the site
  4. It supports annotation – tool for teaching/learning
  5. Users will tell us more about our collectionat than we know
  6. Users will embed content on their own sites and so can the National Archives
  7. Users will repurpose large bodies of our content in interesting ways in mashups
  8. Users can post their own content

Perhaps also some arguments over speed (faster to upload to flickr than locally?), and familiarity (people already using Flickr)

Cost – $25 per anum is a pretty good deal for putting pictures online. Low resource means low risk.

Working at webscale doesn’t mean people are going to start using your content automatically. It’s Flickr that is working at webscale – not your content. Flickr ‘Commons’ helps – exclusive membership and more likely to get interested audience. The National Archive collection is almost exclusively ‘Crown Copyright’ – whereas Flickr Commons default is ‘no known copyright restrictions’ – although as National Archive have power of Crown Copyright they can allow reuse – which also fits Flickr definition of ‘no known copyright restrictions’

Jo wants to be the Library of Congress (i.e. emulate their success on Flickr) – currently only achieving a 10th of the views that LoC has achieved.

Jo saying the user community on Flickr (specifically Commons) to be ‘good’ – tend to attribute when they use etc. Compared to wikicommons for example. However you still have to expect the unexpected – e.g. this re-editing of a photo http://www.flickr.com/photos/big_lion_head/4463202111/

However, need to be careful to judge use and reuse – it is all engagement with content and difficult to judge value of this.

There is a developer community working around Flickr – need to think about engaging them – they may build stuff you’d like to have done:

Some things don’t work so well on Flickr – navigating multipage or multipart documents for example. Need to recognise these limitations. [quite suprised there isn’t a ‘book reading’ app already for flickr to allow you to flick through pictures in this way?]

The NA haven’t been as successful in getting user contributed pictures as they’d like. Some work happening here but not going to happen on Flickr, but rather a local system.

National Archive doing lots of interesting things – experimenting. Jo mentioned a new site for experimentation – TNA Labs – going to be launched v soon – probably next week.

Don’t know how long Flickr will last – or how long the audience on Flickr will last. Can get content out though.

Q & A

Q: (from Wikimedia) What were the problems with doing wikicommons?

A: Some issues with attitude – just scraped pictures and did nothing interesting with them. (Wikimedia saying, would be talk)

Q: (William Kilbride) (oops, got distracted and missed the thrust of this question) – something about using volunteers to build community on the site maybe?

A: We aren’t good at Digital Engagment. Need to improve – haven’t been as imaginatives as we could have been in the use of volunteers

SORT – 1st Morning Q and A

Q: (Paul Miller to Dan) Are you suggesting that the budget/resource issues are an opportunity to do the right things – that we’ve always know we should do?

A: Yes – its about a change in tactics, not a change in strategy

Q: (David Prosser to Mike) Is DRM really dead in the water? Especially in the Academic sector

A: From a ‘world’ perspective – DRM is very unpopular and ineffective. History shows that ‘the web’ leads the way – we should be looking at what has happened with iTunes etc.

Q: (Joy Palmer to Mike) Do you see any tensions between ‘open and free’ to achieve webscale and the ‘shared services’ agenda

A: Yes – already can see tensions with institutional IT and external provision. Free isn’t just about financially free – means ‘accessible’

Q: (Paul Miller to Mike) You said ‘open and free’ leads to eyeballs – but are eyeballs what we want? E.g. Times online strategy – they accept a drop in users because they believe they are focussing on key users

A: Yes – this isn’t about flogging stuff. But there is a driver to make public assets publicly available – need to find ways of opening up to wider audience

Q: (Catherine Grout I think to Mike) How do you think the publishing industry is responding to the challenges you have outlines

A: It’s important that the Times is putting content behind paywall – because it might just work – even though we all expect it to fail. But while commercial sector is starting to grapple with these issues – look at Guardian vs Times. In the academic publishing sector – not so much – really not tackling these issues – not experimenting and taking risks which they should do.

Q: (Peter Burnhill to Mike) Academic publishers have an amazing business model – content is free (to them) and customers pay a year in advance – they’ve got more to lose than commercial publishers. … other stuff (sorry, missed this)

A: Cultural sector starting to realise possibilities offered, need to do more

Q: (Liz Lyon to all) How aware of senior managers (in HE) of the issues raised today

A: Mike: a bit of awareness – but need to start building arguments

A: Rachel: lots of discussion at JISC conference this year – so suggests awareness growing. Need to look at the example of the Open University – not enough being done yet to learn lessons from this

A: Catherine Grout – real challenge is keeping these issues on the political agenda. While there is still pressure to open up Government data, will this continue out to other areas.

Q: (Paul Ayris) Where are universities (in UK and Europe) in terms of being aware and able to do this stuff? Who leads the changes at an institutional level? A great challenge is to identify the vision, and then the leaders. Sense is that in the UK we have a vision, but how we share it, and who leads – challenging if exciting.

A: From Dan: always a couple of people at the top who ‘get it’ – but often seen as cranks. Consultants often brought in to advise – but usually from management consultancy perspective. However, when consultants team up with sector expertise – can have a huge effect (good news for me I think!)

Q: (Jo Pugh? from National Archives) Bit surprised people so gloomy – Government leading the way – noone wants to be behind the Department of Transport in being open!

Q: (?) Instituions aren’t geared up to think about their USPs and competitive advantage – need more effort in these areas. The SCA started on this, but more needs to happen.

A: Mike: When worked at Waterstones was easy to measure ‘success’ – but when he moved to the Science Museum suddenly harder – what to measure when your aim is awareness/communication

Q: (?) Can’t regard HE as single sector – think about Russell Group, Guild HE, Million plus group – benefits are not the same to each of these segments

SORT Session 3 – If you love your content, set it free

This session by Mike Ellis from Eduserv. Know’s nothing about libraries and archives, but worked in Museums. Slides will be on http://www.slideshare.net/dmje

Mike going to try to get through lots of slides… (this is going to be a challenge then!)

Mike starts with 3 thoughts:

  • What value and free mean in a web world
  • how the network has changed us
  • what to do about it

What does ‘value’ mean?

In this particular context value equates to scarcity – the more scarce something is, the more valuable it is – although we need to consider context (diamonds not valuable when you are dying from thirst)

‘Theory of Marginal Utility’ – any particular unit becomes worth less as the availability increases

Scarcity is OK until content arrives on the web when you get:

  1. radically less cost for distribution – e.g. newspapers, music
  2. nearly ubiquitous piracy opportunities – so easy it becomes invisible (even to those doing it) – The scarcity model starts to fail – issue becomes one of usability rather than scarcity.
  3. Users with hugely different expectations. move away from ‘ownership’ to define ourselves – more about emotional attachment (see piece by Martin Weller). ‘Users’ are changing – lazy, fickle, mobile, search-focused and expecting free

These 3 things are relatively well understood but very radical.

What should we do?

1. recognise that this isn’t just a ‘blip’ – this is not going away. Many old business models and practices simply don’t work any more – and will never work again. Others may work, but will need to change to fit into the new world.

2. notice that ‘value’ hasn’t disappeared, but just shifted somewhere else. Examples of Paulo Coelho and Lady Gaga – both saying they are happy to see people using their content for free – Paulo Coelho was essentially about marketing leading to increased sales. For Lady Gaga touring is where the money is [my comment – not sure that this reflects the experience of most musicians? – perhaps there just isn’t money in being an artist?]

For example – Chegg.com – textbook rentals via the web

3. things that can’t be copied are things that get value

Things like trust, authenticity, immediacy is where value lies. e.g. you can give away software but sell support/expertise

4. your content is like a teenager

If your content is on the web, it is out of your control. The only thing you can do is trust that it will return in the morning 🙂

5. If you can’t re-use it, you’ve wasted it

Possible to increase value by enabling re-use – it costs you to produce this data, so why not get best value. And once it is on the web, people will scrape it – you just make this more difficult

6. this is about content and user experience – not about technology

HE institutions are simply not close to being user-centric enough

7. the future is uncertain – open stuff helps with uncertainty. We can focus on content not on how it is held or building restrictive distribution systems

8. it doesn’t really matter how you do it (make data open)

Do what you can – dont’ worry about how – as long as done in loosely copuple way

9. recognise it is about eyeballs

The web is only one mechanism for access – 75% of traffic to twitter is via API – this means people are accessing through a huge variety of ways – widgets, iPhone apps, clients etc.

“Losers wish for scarcity. Winners leverage scale” – Ian Rogers

SORT session 2 – JISC strategic aims and activites and the SCA

Rachel Bruce now to talk about JISC aims and activities…

Referencing the Follet Report – 1993 “The emphasis is shifting towards information and information access. This has profound and far reaching implications, and all institutions act to ensure that they are in a position to deal with these to best advantage.” – how far have we moved on from here?

DNER was a response to this – and the JISC Information Environment – easy to forget that this was visionary – and now we can dissect the IE in hindsight, but too easy to forget how difficult this stuff is – and also to lose sight of the fact that the JISC IE was built on things that continue to be relevant even if specific standards or approaches have moved on.

Start to see major shift in 2003 – we start to see the dominance of Google and the move towards the web as an interactive platform where we both create and consume content.

Need infrastructure to manage, share, discover, access and preserve digital content.

Rachel showing Lorcan Dempsey illustration of the library sitting on the boundary between managed content and consumer content.

Now Catherine Grout talking about the Strategic Content Alliance. The SCA looked at the importance of partnership, policy and practice – how does the work of JISC relate to the work of the BBC and other public bodies. Interestingly Becta was a key partner in the SCA – what happens now? Possibly we’ll see more of a move to ‘local’ away from centralised investment – what will this mean? No-one knows at the moment.

The SCA worked to:

  • identify the best of current practice around creating, delivering and sustaining online content
  • To put this into a framework that can be shared with others

Catherine getting through a lot of stuff – sorry struggling to keep up.

SCA worked on

  • Audience analysis
  • Business models and sustainability
  • IPR/Licencing
  • Internet Marketing, SEO (search engine optimisation) and Writing for the Web

Producing a  series of reports, case studiens, events and/or tools in each case.

SCA has also developed a Presentation Layer – Digipedia

In the next few years it is going to be important to keep these issues in the minds of policy makers. Need practitioners to be able contribute to this and feedback how they are using it.

Survive or Thrive

I’m at Survive or Thrive in Manchester for the next 2 days (http://www.surviveorthrive.org.uk)

Kicking off with Dan Greenstein, Vice Provost from the University of California. I’ve included quite a few quotes below, but as I’m live blogging these should be taken as paraphrasing not verbatim – and any mistakes are my own, not Dan’s!

A couple of quotes from Dan as he introduces his talk
“None of us put on our CVs that we wanted to oversee the downsizing of the University as we know it”
“I’m assuming there is going to be a whole lot of hurt”

Need to ask about the investments institutions will (or will not) make in there libraries. Dan going to take the university and college is the unit of analysis – because they are the vehicles through which investment flows into the acadeic library and potentially into shared library services.

“The challenge for library funding is that it comes from the same revenue stream as funds the core teaching units” (think I got that right). A decision to fund the library is a decision not to fund an academic/teaching post.

Dan says you can always find 10-15% cuts in HE – but 20-25% means fundamental transformation. Dan not going to focus this morning on what can be done, but what should be done.

From 1980-2000 there was a phenomenal growth in student numbers in the UK – and thus a drop in the cost per student – make savings through efficiencies of scale. This in actuality means improved participation rate but a deteriorating student to faculty ration.

Libraries have benefited from budget increases, but spend per student drops – as has the library budget as a proportion of the overall institution budget – that is we are getting a smaller slice of a larger cake. At the same time there have been huge increases in the cost of material – 51% increases in journal subs (2001-2006) and similar (if slightly lower) increases in mongraph costs.

Alongside this cost of materials, libraries have been investing in new services and materials – digitisation, supporting innovative research, dealing with Open Educational materials etc. etc.

Dan says we’ve seen library staff more embedded into academic departments in some areas – not sure how far UK and US practice differs here? Except in medicine in the UK I’m not so aware of this type of embedding, and would say my experience is that subject librarians have struggled to keep up the level of engagement they previously had either due to decreased resource, or changing idea of role (focus on information literacy etc. as opposed to direct engagement with research?)

So – real strategies being considered at UC (University of California):
Collection Management
There is a lot of multiple redundancy in library collections (e.g. based on OCLC analysis). At the same time since the 1940s there has been an explosion in book production. The market is doing mass digitization of the legacy of print materials. Current and in-print material increasingly being made available as e-books.

Dan says: “Redundant management of print materials is insane”

Why do we keep doing this – we invest huge amounts (once you factor in the cost of acquisition, and the cost of storing the material) in maintaining our print monograph collections, most of which suffer from multiple redundancy.

Dan suggests: “Let’s aim to collect the unique generally and the general uniquely” (is there any evidence that reduction in investment in the general collections would result in a move or investment to special collections?)

What would it take to stop this?

  • Secure management of digital copies
  • National repositories for the print ‘copy of record’
  • Localized print on demand and download to the handheld

Dan says – we know how to do this. So why aren’t we?
(I’d question whether we can know if this is going to save money – has going electronic with our journal collections saved us money?)

Could be optimize scarce library funding by:

  • supporting next-generation collections with the same institutionally based fund source that are currently deveoted to traditional library acqusitions (‘traditional’ can encompass print and digital)
  • Would this force a more realistic approach to prioritization and budget trade-off

“No money should be spent on Open Access that doesn’t come out of the library materials budget”

Could this type of strategy result in the development of an institutionally-responseive suite of national digital library services?

  • Consortial licensing that creates ‘collections’ that are profiled to suit institutions with different academic profiles and information needs
  • Discovery to delivery services that orient towrads the individual (inclusing the coordinated cataloging and technical services, and electronic records maangement strategies that that entails)
  • A national institutional repository strategy implemented at the departmental or individual not instituional level

Dan argues that the ‘institutional’ layer for repositories doesn’t make sense – as a consumer he isn’t interested in the output of a specific institution – I agree, but there is a question of how funding works – it may be the institution has an interest in creating institutional collections?

Dan recognises that all this would require huge effort – but argues that it would “leverage exceptional (world-class) nationa infrastructure and distributed library resources in order to:

  • eliminate redundant effort
  • save cost without encroaching on service
  • and if done properly, return real value to universities and college whose investment would be at once essential to sustain and focus the effort

Dan mentions Deepdyve as an example of micropayments for academic material. If we (libraries) don’t change the market will do it to us. For me this is the crux of it – if this is the case – what is the justification for libraries? I’m not sure Dan has said why we need libraries in this scenario? What is our value proposition in a scenario where collection management is done by the individual choosing items for their own collections? To compare to the move in the music industry away from albums to individual tracks – while we can regret the passing of the ‘album’ in music meaning that there is a tendency to converge on the most commercial individual tracks – and expect that the range of music we listen to becomes more limited as a result – but do we know how to stop this?

Dan goes on to argue that if we make the moves he is suggesting it potentially frees up local resource to support students and faculty where they need it.

The key challenges Dan sees are:

  • Communicating the benefits
  • Leadership problem – kick starting an economy for shared service will require intervention at the VC level, and it is very difficult to get their attention. Need to be careful to get the right message – you get attention by talking savings, but this is not the point – we have opportunity to transform and this is what we should be doing
  • 1st mover problem  – who makes the first move? This can’t work unless we do it at an appropriate scale across several (many?) institutions
  • Scope creep – driven by the possibilities in the online information and the needs of the few
  • Threat to local autonomy
  • Threat to the local academic library and academic librarian

“The library will become a broker” – Dan is convinced this is what will happen no matter what.

If you are look at a permanent budge reductions of 25% +, then an ‘orderly retreat’ beasts a disorderly one.

In a networked digital age and a transformed globale economy the academic library will be fundamentally changed…

Q & A:

Q: (David?) Just a comment – changing economic model – i.e. lift of student fee cap – will have huge impact. Believes (as an ex-VC) there is a chance to take action and a ‘segment’ level (probably not national level)

Q:(David Prosser, RLUK) UK Research Reserve focuses on deduplication in journals – need to look at this for monographs. There are problems in moving investment from core collection budget to Open Access costs – Open Access needs larger initial investment than perhaps can be funded.

A: We can’t support two models of scholarly communication at once – institutions need to face up to this and take the issues raised by OA more seriously. We (libraries/universities) are

Q: (me ) question about what the value proposition of libraries is in a disaggregated, disintermediate world Dan describes

A: Basically – there is going to be a huge need for people with information skills – whether they end up being organised as a ‘library’ is another question – and may depend on local politics and policies

Dan says – some of this stuff is things that a small group of people have been talking about and saying we should do for a long time. Now a much larger set of people are interested.