Jul 08

This talk by Dr Andreas Rauber (as an aside, it is great to see some academics here, as opposed to librarians – although quite a few of them and publishers here as well) from Vienna University of Technology (in the Dept of Software Technology and Interactive systems)

Andreas starting with ‘what is digital preservation?’, then going to cover preservation planning and a tool called ‘Plato’ – a preservation planning tool.

So – why do we need digital preservation?

Basic issue of ‘keeping the bits alive’ – but this is not really digital preservation. We know a lot about this kind of work, and it can be a lot of work, but a bottom line, can be done.

However, maintaining the bits is just a small part of the problem. Digital Objects require specific environment to be accessible – files need specific programs, proggrams need specific operating systems, and operating systems need specific hardware components.

Software and Hardware environment is not stable – you encounter issues where:

  • Finels cannot be opened anymore
  • Embedded objects are not longer accessible/linked
  • Programs won’t run
  • Information in digital form is lost – usually completely failure rather than gradual degradation

Strategies for Digital Preservation (using http://unesdoc.unesco.org/images/0013/001300.130071e.pdf) for categories:

  • Short term
  • Medium term
  • etc.

Andreas going to look at two approaches:

Migration

  • Transformation into different format

Usually get some changes in transformation – if you do this several times, will have ‘damage’ to the digital object

Emulation

Emulation of h/w or s/w

Both advantage and disadvantage that object is rendered identically – you can access the object, but you may not know how to use the interface.

Looking specifically at Scientific Publishing – what are you trying to preserve?

  • The publication
  • Context of the publication
  • Adjunct material (slides, notes, videos)
  • Demos, exercises, interactive elements
  • Data sets and simulations
  • Community aspects – discussion etc.

So – Digital Preservation is complex

You need to under both the object, and its use and context.

So – ‘Preservation Planning’…

There are many different strategies – how do you know which one is most suitable – and how do you know if you’ve been successful 10/20/50 etc. years later?

As part of the DELOS DP Cluster here was a workflow developed, which has now been refined and integrated within PLANETS. It is based on the ‘utility analysis’ approach developed in Vienna.

Plato is a tool which helps with preservation planning – you need to:

  • Define requirements (requires detailed analysis of what you want and what is important – for e.g. for a web page is the appearance of the hyperlinks important, or just the target information; if there is a web counter is it preserved at a specific date, does it count hits on the archived copy, does it continue to count hits on the ‘live’ copy? etc.)
  • Evaluate alternatives (including not to draw up preservation plan if you want)
  • Consider results
  • Build preservation plan

All this looks interesting but suggests that this is going to be an incredibly expensive process (even to do the preservation planning, nevermind the actual preservation). This drives it home – we need to be good at deciding what is worth preserving in the medium/long term – and only embark on this kind of exercise where we know we want to do the preservation.

Plato is a ‘concretization’ (is that a word?) of the OAIS model, which follows recommendations of TRAC and nestor – it is a pretty generic workflow, so should be easy to integrate it into different settings.

In a case study of electronic theses, found that for these Plain text doesn’t satisfy several minimum requirements, RTF is weak in Appearance and Structure, and that the deactiviation of scripting and security are knock-out criterium (for PDF)

Andreas stressing the key role of the the ‘defining requirements’ stage – this is the point at which people start identifying what is important, and you can start to see cost vs. benefit

http://www.ifs.tuwien.ac.at/dp

http://www.ifs.tuwien.ac.at/dp/plato

Some conferences coming up on Digital Preservation including one at the British Library on 29th July.

Q: Who should take responsibility?

A: Need people from the ‘user’ side who at least know what they want, also need skills in IT, and input from Management on cost etc.

Once there are a number of examples of needs analysis of ‘type’ of material – e.g. e-theses, they can consolidate into a shareable template – however, need a number of studies first to capture wide range of requirements, rather than finding requirements from first study results in others narrowing their view down to whatever the first institution identified.

written by ostephens


One Response to “Digital Preservation Challenges: planning and implementing solutions for scientific publishing”

  1. 1. Henry Gladney Says:

    Vis-a-vis Rauber 2008 chart #8 “Emulation” consider the following question and proposition w.r.t. some digital information of interest “INFO”. If that interests you, consider two readings cited at the bottom.
    What is it that one wants to preserve, the computing environment of INFO or the essential content of INFO?
    Looking at the final 5 points of Rauber’s chart, to emulate the computing environment would be VERY difficult and expensive.
    And it would also be mostly effort applied to something else than INFO, because most of the computing environment will not have been used to make INFO useful for human beings.
    What one wants, and what is readily feasible, is some computing environment that perpetuates useful rendering of the essential content of INFO. A Turing machine can be used to do that, and a Turing machine is much, much simpler than any widely offered HW and SW environment.
    See H.M. Gladney and R.A. Lorie, Trustworthy 100-Year Digital Objects: Durable Encoding for When It’s Too Late to Ask, ACM Trans. Office Information Systems 23(3), 299-324, July 2005.
    Also Preserving Digital Information, Springer Verlag, 2007 ISBN 978-3-540-37886-0

Leave a Reply