For the next two days I’m at the 3rd LIBER-EBLIDA Workshop on Digitization of Library Material in Europe. I’m here because I’m speaking later today about the JISC Guide to Open Bibliographic Data which I co-authored, but around all that there is a very interesting programme.
First up this morning is Hildelies Balk on the IMPACT project – http://www.impact-project.eu/news/coc/. This project is trying to tackle the issues related to OCR of digitised historical texts. The main achievements of IMPAC so far:
Improved commercial OCR (ABBYY ‘IMPACT’ Finereader 10 on market)
Effective tool for OCR correction with volunteer involvement (IBM CONCERT) ready for implementation
Novel approaches to preprocessing, OCR and post-correction available
Computer lexica for 9 languages close to delivery
Digitisation Framework with evaluation tools available
Facility to plug in other tools (if you have tools you can integrate)
Large dataset with sophisticated ‘ground truth’ close to final delivery
Unique network of expertise
Challenges in digitisation of historic material still there – there is no lak of novel approaches to improve access – both within IMPACT and many other projects
The challenge is translating from these novel approaches to real life implementation – many of the developments do not integrate into library workflows well
Where next? Direction needed for work – e.g. should we really be investing in mass re-keying of content?
To sustain IMPACT, they need to have a Business Model which would keep the centre running after the end of the current EU funding. IMPACT have done workshops throughout the project – covering all levels of staff. Used approach described is http://www.businessmodelgeneration.com.
First questions they tackled – what is the value proposition and what are the customer segments?
Major customer segment – the ‘service providers’ (presumably companies like Proquest? – not clear). IMPACT has all major content holders in the consortium – so clear value proposition – access to the content holders through single route
Another major customer segment – the content holders. Ideas proposed included mediating consultancy between content holders and others with expertise.
So these ideas discussed, and of course moved onto other parts of the business model. Often find people move to the ‘rational’ side of the model quickly – e.g. people often focus on costs before other issues sorted out.
Centre of Competence – benefits for content holders:
Exchange of best practice in ocmmunity of content holders
KnowledgeBank with comprehensive and up to date information and tech watch reports
Training on demand and online tutorial
Online support thtrough a helpdesk
Support in the implementation of the innovateive IMPACT solution for imrpoving access to text
Access to the IMPACT dataset with ‘groudn truth’ and tools for evaluation
Digitisation framework – guidelines of using the open source workflow management system Tavernana
Three levels of membership:
Open – access to forum – part of content
Basic membership (fee) – access to all facilities, reduced fee for conferences
Premium membership (fee) – member of the board, privileges such as free entry to conferences
Follow IMPAC on twitter http://twitter.com/impactocr