This talk by Dr Andreas Rauber (as an aside, it is great to see some academics here, as opposed to librarians – although quite a few of them and publishers here as well) from Vienna University of Technology (in the Dept of Software Technology and Interactive systems)
Andreas starting with ‘what is digital preservation?’, then going to cover preservation planning and a tool called ‘Plato’ – a preservation planning tool.
So – why do we need digital preservation?
Basic issue of ‘keeping the bits alive’ – but this is not really digital preservation. We know a lot about this kind of work, and it can be a lot of work, but a bottom line, can be done.
However, maintaining the bits is just a small part of the problem. Digital Objects require specific environment to be accessible – files need specific programs, proggrams need specific operating systems, and operating systems need specific hardware components.
Software and Hardware environment is not stable – you encounter issues where:
- Finels cannot be opened anymore
- Embedded objects are not longer accessible/linked
- Programs won’t run
- Information in digital form is lost – usually completely failure rather than gradual degradation
Strategies for Digital Preservation (using http://unesdoc.unesco.org/images/0013/001300.130071e.pdf) for categories:
- Short term
- Medium term
Andreas going to look at two approaches:
- Transformation into different format
Usually get some changes in transformation – if you do this several times, will have ‘damage’ to the digital object
Emulation of h/w or s/w
Both advantage and disadvantage that object is rendered identically – you can access the object, but you may not know how to use the interface.
Looking specifically at Scientific Publishing – what are you trying to preserve?
- The publication
- Context of the publication
- Adjunct material (slides, notes, videos)
- Demos, exercises, interactive elements
- Data sets and simulations
- Community aspects – discussion etc.
So – Digital Preservation is complex
You need to under both the object, and its use and context.
So – ‘Preservation Planning’…
There are many different strategies – how do you know which one is most suitable – and how do you know if you’ve been successful 10/20/50 etc. years later?
As part of the DELOS DP Cluster here was a workflow developed, which has now been refined and integrated within PLANETS. It is based on the ‘utility analysis’ approach developed in Vienna.
Plato is a tool which helps with preservation planning – you need to:
- Define requirements (requires detailed analysis of what you want and what is important – for e.g. for a web page is the appearance of the hyperlinks important, or just the target information; if there is a web counter is it preserved at a specific date, does it count hits on the archived copy, does it continue to count hits on the ‘live’ copy? etc.)
- Evaluate alternatives (including not to draw up preservation plan if you want)
- Consider results
- Build preservation plan
All this looks interesting but suggests that this is going to be an incredibly expensive process (even to do the preservation planning, nevermind the actual preservation). This drives it home – we need to be good at deciding what is worth preserving in the medium/long term – and only embark on this kind of exercise where we know we want to do the preservation.
Plato is a ‘concretization’ (is that a word?) of the OAIS model, which follows recommendations of TRAC and nestor – it is a pretty generic workflow, so should be easy to integrate it into different settings.
In a case study of electronic theses, found that for these Plain text doesn’t satisfy several minimum requirements, RTF is weak in Appearance and Structure, and that the deactiviation of scripting and security are knock-out criterium (for PDF)
Andreas stressing the key role of the the ‘defining requirements’ stage – this is the point at which people start identifying what is important, and you can start to see cost vs. benefit
Some conferences coming up on Digital Preservation including one at the British Library on 29th July.
Q: Who should take responsibility?
A: Need people from the ‘user’ side who at least know what they want, also need skills in IT, and input from Management on cost etc.
Once there are a number of examples of needs analysis of ‘type’ of material – e.g. e-theses, they can consolidate into a shareable template – however, need a number of studies first to capture wide range of requirements, rather than finding requirements from first study results in others narrowing their view down to whatever the first institution identified.