Overcoming information overload

The keynote this morning from Kevin Anderson (@kevglobal) and Suw Charman-Anderson (@suw) – journalists and technologists (http://charman-anderson.com/).

Kevin kicks off: Journalists and librarians dealing with many of the same issues – helping people navigate, interpret and understand information. Going to talk about some of the challenges in this area. First playing Xerox video on ‘information overload’ – http://www.youtube.com/watch?v=CXFEBbPIEOI

Eric Schmidt noted that we are now creating huge amounts of information (5 exabytes every 2 days is the quote, but see disagreement with this figure at http://www.readwriteweb.com/cloud/2011/02/are-we-really-creating-as-much.php)

Amount of time people spend on Facebook massively more than they spend on Newspaper web sites. Evidence that people are having problems moving to conclusions on complex stories – people move to simple narratives instead – Kevins says this equals “car crashes and celebreties”

Social media offers opportunity to re-engage people and help them navigate information.

We are moving from “mass” to “relevance” – e.g. not about how many followers you have on twitter, but about the relevance of what you post. Try to move from information overload (a ‘mass’ problem) and have filtered relevant information (a ‘relevance’ solution)

Social media provides a way of filtering information. But social media has to be ‘social’ – you need people at the heart of this.

Examples of crowdsourcing – Guardian analysis of MP expenses (http://mps-expenses.guardian.co.uk/), Ushahidi crowdsourcing crisis information (http://www.ushahidi.com/).

Kevin also mentions ‘entity extraction’ – uses Calais as an example..
Dewey D. – iphone app to manage ‘reading list’ (not in academic sense) and pulls in stories from the New York Times.

Poligraft – analyses funding of politicial campaigns – you can post URLs (of political stories) to Poligraft – it goes through and identifies politicians and organisations and shows you how politicians get campaign funding etc. Tells you about the major industries funding politicians etc – gives context to political story and help make sense of it.

We (journalists & librarians) have hundreds of years of doing things in a certain way – changing culture is incredibly difficult. If you have more than 5 people in the room, inertia hits …

Now Suw taking the floor… to talk crowdsourcing – breaking large tasks into smaller chunks that individuals can do. Suitable tasks – computational tasks and ‘human’ tasks.

Computational tasks = large datasets of computation that can be split into smaller datasets or computations – e.g. SETI@Home – this is about ‘spare cycles’ from individual’s computers they can contribute to computing power.

Human tasks = tasks that humans find easy but computers find difficult; brain driven; uses participants spare time; individual errors are average away by having the same task completed by many people.

Type of human tasks:

  • Recognising and describing things in images
  • Reading and transcribing writing
  • Applying expertise to identify, sort and catalogue
  • Collecting data
  • Manipulating models

Examples …

PCF oil paintings tagger – http://tagger.thepcf.org.uk/

  • Public catalogue foundation, BBC
  • Digitising pictures
  • Getting people to tag content with metadata – describe what is in the painting

“You don’t have to be an expert to take part”

Old Weather – http://www.oldweather.org/
Transcribing ships logs – contributes to historic data on climate, as well as other historical background

Ancient Lives – http://ancientlives.org/
Papyrus fragments – transcribe, measure, etc.

Multiple people doing each task gives you confidence when agreement across results

Herbaria@Home – http://herbariaunited.org/atHome/

What’s the Score – http://www.bodleian.ox.ac.uk/bodley/library/specialcollections/projects/whats-the-score
Digitised musical score collection from the Bodleian – will be starting crowdsourcing part of project soon

Why crowdsource?
Provide opportunities for education and knowledge maintenance
Most projects don’t require prior knowledge but people often enjoy learning more about a subject
Improve accessibility through addition of new metadata or improvement of existing metadata – create data for research
Even when digitised, collections are hard to search/comprehend

Galaxy Zoo shows public were as good, or better, than professionals at classifying galaxies
FoldIt found gamers could solve the structure of a protein that causes AIDs in rhesus monkeys in three weeks

Are your projects suitable?

  • Can the original material be digitised?
  • Can task be broken down into small chunks?
  • Can those chunks be done by humans or their computers?

It also helps if…

  • There is a benefit for the public – example of Google buying out a image tagging game, which then died
  • People feel part of a community
  • There are measurable goals and targets

Zooniverse are crowdsourcing gurus..
Citizen Science Alliance – “Science” doesn’t just mean science – looking for projects at the moment…
Events – e.g. Citizen Cyberscience Summit

Q & A:
Failure of crowdsourcing – NASA mapping craters on Mars – mid 80s. But failed to collect data in useful way.
In terms of issues around the data
Wikitorial – not enough community – hurdles to participation not a bad thing

Samarcande

Belgian ‘meta-union’ catalogue in Belgian. Were real problems with sharing metadata across regions – political interference meant that not all regions/libraries included.

Wanted a ‘next gen’ OPAC – various reasons:

  • Users like mouse, not keyboard
  • Surveys show satisfaction is higher than traditional OPAC … and
  • We ‘love’ New York public library’s OPAC

W3Line developed Samarcande. From technical point of view…

Challenges:

  • High volumes of bibliographic data coming from several origins
  • Create an intelligent database with FRBR scheme
  • Search functionalit: advanced search, facets, tags
  • Social network services (web 2.0)
  • Give internal and external services

Samarcande is a catalogue of catalogues – 7 partners

  • 6 union catalogues
  • Plus database of journal articl references
  • Variety of bibliographic description
  • Lack of shared rules or authorities (except for subject headings)
Totally impossible to do virtual search – have to aggregate records in an unique database.
  • Detect identical references – keeping local information
  • Keep the best of each reference (summaries, subject headings)
  • Keep all identifiers in order to propose retun links to original catalogue
  • Develop connectors to import and index the data
  • Get data from web services
  • Answer to SRU/SRW requests
Includes search functionalities:
  • Advanced search / autocompletion
  • Did you mean
  • Results by relevance
  • Facets
  • Tag cloud
  • Historic, reference basket, results by mail
  • Search profiles, bibliographies
FRBR
  • Gather editions of the same publication
  • based on author-title key as lack of any other identifier
  • Social network contents attaches at the ‘work’ level
Samarcande built on
  • mysql, jquery, php, solr
  • moccam for ILL

Meaning-based computing

This session by Terence Huwe

What is meaning-based computing? (MBC)

Importance of forecasting probability – ‘how should we modify our beliefs in the light of new information” – see “The Theory that would not Die” Sharon Bertsch McGrayne (http://www.librarything.com/work/11186931)

Based on Bayesian analysis.

What are the applications and potential uses of meaning based computing? Used for code breaking, handwriting and speech analysis etc. Approach commercialised by Michael Lynch – in the form of Autonomy (now acquired by Hewlett-Packard) – applied to ‘enterprise search’. 80% of a firm’s info assets are unstructured and thus hard to retrieve conventionally.

Two events furthered the growth of MBC – in 2007 the US federal rules of civil procedure made all data forms admissible for litigation – seen in the Enron case. The explosion in social media has created new challenges for firms – meaning they need to track huge amounts of unstructured information.

So – enterprise search is booming – MBC thrives in commercial and pure research settings. Autonomy’s MBC-based tools:

  • Implistic Query – hotkey to related information without leaving a primary task
  • Hyperlinking – live links, diverse sources
  • Smart or Active Folders
  • Automatic Taxonomy Generation
  • Sentiment Analysis
  • Automatic clustering of all data types
What is the impact on Information professions?
Starting to see some ‘seeping’ from enterprise search world:
  • “meaning based healthcare”
  • Universities use it at the enterprise level
  • Consulting
  • Telecommunications
One clear example of use in library domain is ACM use it for search of their digital publications.
Potential applications:
Turbo-charged meta-search
Effective search of unstructured data
Establish relationships between structured data (libraries etc.) and unstructured data
Taxonomy and MCB solutions might co-exist – why? Because MBC can manage social media categorization as an automated process. For this to happen, (library) developers need to get involved.
Pattern recognition is practiced at the reference desk – MBC proves that it is a high-level skill. More machine assistance going to be a good thing – we (information professionals) need to find a place at the table.
Forecasts:
  • Academic-based digital library developers may take an interest
  • Vendors might explore MBC as a meta-search tool
  • Repositories may get a boost
  • The practice of reference librarianship would benefit from this kind of tool
Conclusions
  • Need to be aware of MCB
  • Should analyse it’s, as yet unknown, potential for search and discovery within our digital libraries

Cheapskates guide to Linked Open Data

Rurik Greenall (@brinxmat) with a ‘Cheapskates guide to linked open data’. Using ‘Gunnerus Library’ special collections as an example. Wanted to remove the ‘fear experienced when faced by expert interfaces’ – want an interface that contextualises data.

Rurik says “if you’ve created a PDF, you’ve created RDF” – it’s baked in there by default. Rurik shows example from http://folk.ntnu.no/greenall/gunnerus/search/ – some is local data, but some dragged from other places across the web. Nice looking interface.

Rurik says Linked open data “Clawing back what remains of our professional dignity”

  • You have to learn about RDF – but it really isn’t that difficult
  • Tools of the trade – Google Spreadsheets; Google refine (esp. with DERI Galway RDF plugin)
  • Talis offer free hosting if your data is openly licensed
  • Tell all the people
  • Develop your application – you will need a programmer 🙂 but you’ve already modelled your data…
Q: What are the mature libraries for manipulating RDF?
A: Look at Sesame for Java; ARC2 for PHP

Surfacing the Academic Long Tail

Lisa Charnock and Andy Land from Mimas/John Rylands University (JRUL) respectively. JISC funded project ‘SALT’ (Surfacing the Academic Long Tail). JRUL had a lot of usage data. Hypothesis:

“Library circulation activity data can be used to support humanities research by surfacing the long tail …”

So essentially about developing ‘recommendation services’

Also wanted to look at possibility of developing and API-based national shared service.

Looked at work by Dave Pattern at Huddersfield which built recommendations into their OPAC. Wanted to build on the JISC MOSAIC project.

Market Research by MIMAS shows:

Seredipity still very important in terms of discovery
Increase in Anxiety for researchers – worried that they are ‘missing out’ on material that is out there but they aren’t finding
Trust concerns – who is making this recommendation, where does the data come from, why is this being recommended
Students tended to be sceptical of tagging and reviews, but saw potential benefit of recommendations in the style of Amazon (although again trust issues came up)

JRUL interested as different ways of surfacing content. The process for data was:

  • Loan transaction data extracted
  • Data anonymised and given to Mimas
  • Mimas processes data
  • API implemented in Capita Prism sandbox using JUICE framework
  • Additional processing performed on demand by API

API also been implemented in COPAC prototype interface.

Wanted to look at how real researchers found the process. Did two rounds of testing – first round found that they generally wouldn’t borrow the recommendations. However, when tweaked thresholds for recommendation, and ran the research again, found a complete swing to the other extreme, that most would borrow the things recommended – shows getting these thresholds right is key and subtle.

100% of those consulted would welcome a recommender function based on circulation records – even though they thought some of the recommendations were irrelevant…

What about a shared service? Some interest, but question of ‘why should we prioritise this’ from potential (library) partners – needs more work on the business case (find this baffling – speaks for itself for me but there you go…)

JRUL now going to test the recommender with subject librarians, and planning to go live either with local service, or national service if that gets off the ground). Will be making SALT recommendations alongside bX recommendations in new discovery interface at JRUL (Primo)

Thinking about allowing users to adjust thresholds to their own satisfaction, rather than dictating them.

Mimas want to:
Aggregate more data
Evaluate the longer-term impact on borrowing patterns at JRUL
Gather requirements/costs for a shared service
Investigate how activity data aggregations could be used to support collection development

See blog for more http://salt11.wordpress.com and also more on SALT and other activity data projects at http://www.activitydata.org

Q & A:
Q: Is software/data made available?
A: Yes – Juice extension on the juice site (couldn’t find it); Data has been released for others to use; other s/w and API will be released

Q: What about privacy issues?
A: Generally these projects have collected data at a high level – so can’t identify individuals;
A: Growing expectation that data will be made open – so need to consider this

Library Impact Data Project

Dave Pattern (@daveyp) and Bryony Ramsden (@libraryknitgirl) talking about the JISC funded Library Impact data project.

Wanted to look at how usage and non-usage of library resources affects degree outcome. Initially looked at University of Huddersfield data only. Examined visits to the library, and found it pretty equal no matter what the outcome (in terms of degree classification). However, when looking at book borrowing and e-resource usage, saw higher levels of use linked to higher level of achievement. Note, clear that there is correlation, but just looking at these stats doesn’t say anything about causation.

JISC funding gave opportunity to expand study across 7 more universities, and to look at the Huddersfield data in more depth.

When doing the study, had to make sure data protection issues were considered, and made sure data was anonymised. Much of the data released online at http://tinyurl.com/lidp-opendata – encourage others to play with it.

Analysis of data showed that there is a relationship between use of library resources and academic achievement. To back up the statistics, did more qualitative investigation via focus groups. Found discipline differences – e.g. Arts students tend to do a lot of ‘in library’ use, and also do a lot of web browsing for images etc, but not logging into ‘library resources’.

What next? Want to:

  • do more analysis of relationship at course level
  • how to target staff resources more effectively
  • impact of reading lists and recommendations
  • look feasibility of a national level service which you could load data to and get analysis back

Would expect find similar findings across other Universities – and independent research in New Zealand (and elsewhere?) back up the findings.

Q & A
Q: Did you look at drop out from University in light of library use?
A: No, but could do that in the data

Q: Have you considered other causation possibility
A: Yes – did explore some aspects on this in focus groups. Clearly overall outcome is affected by many things, so library usage can only ever be one part of it

Princeton e-reader pilot

Jennifer Baxmeyer and Trevor A. Dawes now talking about e-reader circulation at Princeton University Library. (some more detail online at http://www.princeton.edu/ereaderpilot/)

Trevor kicks off. Princeton offered chance of participating in pilotting the use of Kindles in libraries. The pilot showed that the Kindle DX was good for leisure reading, but not so good for study – esp. inability to use multiple texts simultaneously (note Kindle has changed since 2009)

Started to receive requests to download content to the devices. Seeing huge increase in ebook sales and usage (possibly driven by Xmas presents as see spikes in January)

Amazon sell 105 e-books for each 100 printed books.

Jennifer now coming in to talk about proposal the library made to start engaging with increase in ebook usage.

Received an ILL request for an item that turned out to only be available electronically and in fact only on the Amazon Kindle. Realised this was the tip of the iceberg. So started a working group to determine best way of acquiring e-content when requested.

Already many libraries lending both e-books and e-book readers (not just Kindles)

Princeton realised they were going to have to purchase several types of e-book device to offer content available in different proprietary formats. However, some platforms – specifically the iPad – can support multiple different formats via different ebook reader apps – Kindle App, Borders app, iBooks etc.

Decided to pilot Kindle and iPad as this covered the main formats. Proposed purchase of 3 iPads and 4 Kindles. Already had a laptop circulation programme, so could use same approach for iPad. Kindles were circulated in specialist engineering/science libraries.

For iPads same content would be available across devices, and patrons could request new items which would be reviewed by purchasers as any stock request. Kindle would have specialist (and non-duplicate) material on it, and each Kindle would have different content on them.

Next step to figure out how to make items discoverable – started to advertise via newsletter and email. Also decided to catalogue devices and the content on the devices (other libraries such as MIT and Stanford do this as well). Catalogue record was for the device, then a ‘contents note’ would detail the items. Each item also catalogue separately, but represented as linked and bound together with single item so that if device was checked out, all items would show as unavailable.

Cataloguing model still not completely agreed – still working on it.

Trevor again now talking about accessibility issues. This had come up in the Kindle DX pilot and accessibility had been challenged by US National Federation for the Blind (they wrote to all libraries participating in the pilot). This resulted in agreement with Justice Department including the term:

“The University will not require, purchase, or incorporate in its curriculum the Kindle DX or any other dedicated electronic book reader for use by students in its classes or other coursework unless or until such electronic book reader is fully accessible to individuals with visual impairments.”

This agreement is binding until 30th June 2012 . Letter available online at http://www.ada.gov/princeton.htm

This means that currently there is a delay in launching the program. At the moment staff can checkout iPad or Kindle for three days – gives opportunity for feedback, and to get staff familiar with the devices. Allowed some purchase of apps/content, but had to be ‘work related’, and limited how many could be purchased. Part of the requirement of checking out the device was to fill out survey.

Now have green light on going ahead with iPad lending program – so that will be starting off soon – aiming for June 2012. However, issues with Kindle still unresolved…

Reference Management on the Move

Alison McNab opening this session on ‘mobile library services’ talking about Reference Management and mobile devices/services.

Quick look back over the history of reference management – and noting move from simply ‘reference management’ to ‘pdf’ (or I guess more generally ‘document’?) management.

Also changing environment – increasingly scholars are ‘mobile’; plagiarism and information literacy higher on the agenda; library budgets continue to shrink.

Questions when looking for reference management packages:
Subscription or free/open-source
Web based or client based
Central support or peer-support

Move towards ‘portable’ solutions. References being stored ‘in the cloud’; access to references on handheld devices; easy to share references.

Some software packages support multiple modes of use – Mendeley mentioned particularly.
Tools for sharing go beyond traditional reference management – collaborative research; social bookmarking; continued access after graduating or moving away from institution.

On latter point, could be advantage of free/open-source solutions (I’d say ‘personal’ really – if you buy EndNote client, it’s yours), but subscription packages increasingly allow some access after you’ve left institution but either for limited time, or other with restrictions.

Challenges:
Range of s/w options
Storage in the cloud – need good backup
Working with “expert users” of non-standard/non-approved s/w

Anecdotal evidence suggests that so far users will use both apps and mobile web interfaces for reference management packages but only for browsing references, not reading papers. However, larger mobile devices – i.e. tablets – are changing this.

Workshop Reports

This morning kicks off with reports from four breakout sessions yesterday afternoon

Workflow

What is a ‘workflow tool’ – everything from Excel to Goobi. Decided to be inclusive
Things people were using:
Czech national database systel ‘Digital Registry’ – locally developed s/w with possibility it might go open source
Goobi
Zend framework
In-house systems

Those not using workflow tools often saw them as overly complex for small projects
Existing tools are related to the scale of the project
But projects don’t have to be that large to generate workflow – 500 manuscripts – or less if lots of people involved

What would ideal workflow tool look like?
Reusable in multiple projects at all sizes
Monitors performance
Gives statistical evidence about every step
Track books & activites over multiple sites

Needs to manage notes, tags and details for every item (fragility, lack of metadata, broken parts etc)
Tool should interoperate with ILS/Repository s/w
Workflow systems should work with each other – e.g. being able to push information to centralised workflow tools that could aggregate view of what has been done
Should index new digital item in the ILS
Automatically create records in the ILS when new digital item is available
Scalable … and free of charge!

Business Models
Looked at 6 different business models

  1. Publisher model
    • Proquests Early European Books. Publisher funds digitisation and offers subscription for x years to subscribers outside country of origin; free in country of origin; Year x+1 resource is full open access
    • brightsolid and the BL – making 40 million pages of newspapaers available – brightsolid makes material available via paid for website
    • lots of potential for more activity in this model
  2. National Government funding
    • e.g. France gave 750 million euros to libraries to digitise materials. However, government then decided it wanted financial return, so now National Library has launched an appeal for private partners
    • Seems unlikely to be viable model in near future
    • Government Research Councils/Research funders have mandates for data management plans and data curation – but perhaps not always observed by those doing the work. Perhaps if compliance was better enforced would give better outcome
  3. International funding – specifically EU funding
    • LIBER in discussion with EU bodies to have libraries considered as part of European research infrastructure- which would open new funding streams through a new framework programme
  4. Philanthropic funding
    • National Endowment for the Humanities/Library of Congress fund a National Digital Newspaper programme
    • Santander who funds digitisation – e.g. of Cervantes. Motiviation for company is good PR

Two further models that are possibilties going forward:

  1. Public funding/crowdsource model
    • e.g. Wikipedia raised ‘crowdsourced’ funding
    • Can the concept of citizen science be applied to digistation – e.g. FamilySearch has 400,000 volunteers doing scanning of genealogical records and transcription
  2. Social Economy Enterprise Models
    • Funders might fund digitistaion for ‘public good’ reasons – people employed will have more digital skills as a result; progresses employment agenda – for the funder the digitisation is not the point, it is the other outcomes
    • Such a model might draw investors from a range of sectors – KB in The Netherlands which uses such an approach for the preparation of material for digitisation

User ExperienceFirst discussed ‘what do we know about user experience’ – have to consider what users want from digitisation
Crowdsourcing – experience and expectations – experience so far seems to suggest there is lots of potential. However noted need to engage with communities via social media etc. Question of how sustainable these approaches are – need to have a plan as to how you preserve the materials being added by volunteers. Have to have clear goals – volunteers need to feel they have reach an ‘endpoint’ or achieved something concrete

Challenge to get input from ‘the crowd’ outside scholarly community

Metadata and Reuse
Group got a good view of the state of ‘open (meta)data’ across Europe and slightly beyond. Lots of open data activity across the board – although better developed in some areas. In some countries clear governmental/political agenda for ‘open’, even if library data not always top of the list

Some big plans to publish open data – e.g. 22 million records from a library network in Bavaria planned for later this year.

A specific point of interest was a ruling in France that publicly funded archives could not restrict use of the data they made available – that is they could not disallow commercial exploitation of content e.g. by genealogical companies

Also another area of legal interest, in Finland a new Data management law – emphasises interoperability and open data open metadata etc. The National library – building a ‘metadata reserve’ (would have been called a ‘union catalogue’) – bibliographic data, identifiers, authorities.

There was some interesting discussion around the future of metadata – especially the future of MARC in light of the current Library of Congress initiative to look at a new bibliographic framework – but not very clear what is going to happen here. However discussion suggested that whatever comes there will be an increased use of identifiers throughout the data – e.g. for people, places, subjects etc.

It was noted that Libraries, archives and museums have very different traditions and attitudes in terms of their metadata – which leads to different views on offering ‘open’ (meta)data. The understanding of ‘metadata’ very different across libraries, museums and archives. The point was made that ‘not all metadata equal’ – for example an Abstract may need to be treated differently when ‘opening up’ data than the author/title. A further example here was where Table of Contents information had been purchased separately to catalogue records, and so had different rights/responsibilities attached in terms of sharing with others

There was some discussion of the impact of projects which require a specific licence. For example, some concern that the Europeana exchange agreements which will require data to be licensed as CC0 will lead to some data being withdrawn from the aggregation.

Final part of discussion turned to search engines – they are looking at different metadata formats – i.e. http://schema.org. Also our attitudes to data sharing change when there is clear benefit – while some organisations enter into specific agreements with search engines – e.g. OCLC – in the main most libraries seemed happy for Google to index their data without the need for agreements or licensing. Those with experience noted the immediate increase in web traffic once their collections were indexed in Google.

Upscaling digitisation at the Wellcome Library

Wellcome library – part of Wellcome trust, a charitable foundation which funds research and includes research/contextualisation etc of medical history

Wellcome library has a lot of unique content – which is the focus of their digitisation efforts. Story so far:

Image library created from transparencies/prints – and on demand photography – 300,000 images
Journal backfiles digitisations
Wellcome Filme – 500+ titles
AIDS poster projects
Arabic manuscripts – 500 manuscripts (probably biggest single project)
17th Century recipe books

Contribute to Europeana

Digitisation part of longterm strategy for the library – but while aim is to eventually digitise everything, need target content.

Digitisation archival material, around 2000 books 1850-1990 (pilot project – and of course will test waters in copyright areas). Also contributing to Early European Books project – commercial partnership with ProQuest.

Approach to digitisation projects has changed. Previously did smaller (<10,000 pages) projects, relatively ad hoc, entirely open access, library centric, no major IT investment – but now doing large project (>100,000 pages) with involvement from wider range of stakeholders – within and outside organisation, needs major IT development. Also increasing commercial partnerships mean not all outputs will be ‘open access’ – although feel that this is about additional material that would not have been done otherwise…

Need to move

  • Manual processes -> Automated processes (where possible)
  • Centralised conservation -> distributed conservation
  • Low QA -> increased QA, error minimization
  • Using TIFF -> JPEG 2000 (now 100% JPEG 2000 after digital copy created)
  • From detailed and painstaking to streamlined and pragmatic

Streamlining:

  • Staff dedicated to specific projects or streams of work
  • Carry out sample workflow tests for new types of material
  • Right equipment for right job – eliminate the ‘fiddly bits’ – led to:
  • Live-view monitors
  • Easy-clean surfaces
  • Foot-pedals
  • Photographers do the photography
  • Prepare materials separately
  • Leave loose pages and bindings as they are – easier to digitise that way
  • Use existing staff as support
  • Minimise movement
  • Plenty of shelving and working space
  • Find preferred supplier for ad hoc support

Upscaling and streamlining digitisation requires a higher level of project management

Goobi http://www.goobi.org/:
Web-based workflow system
Open source (core system)
Use by many libraries in Germany
Wellcome use the Intranda version (Intranda a company who do develop Goobi)

Goobi is task-facuse, customisable workflows – developed specifically by Intranda
User-specific dashboard
Import/export and store metadata
Encode data as METS
Display progress of tasks, stats on activities
tracks projects, batches and unit
Can call other systems – e.g. ingest or OCR

Q: Is Goobi scalable? Can it be used for very big projects
A: Goobi works well for small institutions – don’t need programmers to implement and relatively cheap. But probably scalability going to be limited by hardware rather than anything else

Q: How does Intranda version differ to other version of Goobi
A: at least at Wellcome … e.g Goobi doesn’t handle ‘batches’ of material – Intranda added this material. Goobi uses Z39.50 to get metadata, Wellcome wanted to get metadata elsewhere, so adjusted to do that by Intranda