ArchiveSpace at Edinburgh

Presentation by Scott Renton:

Current Management:

  • Standard ISAD(G), Schema EAD (XML)
  • Laid down by GASHE and NAHSTE projects
  • managed by “CMSyst” an in-house built MySQL/PHP applicaiotn
  • Data feeds to ArchivesHub, ARCHON, CW Site (EDINA)

Limitations of CMSyst, wanted a new system to extend what they could do. Looked at a range of options including:

  • DSpace
  • Vernon (Museums system)
  • Calm/Adlib (commercial system)
  • Archivists’ Toolkit (lacking a UI)
  • ICA ATOM

However – all had shortcomings. Decided to go with ArchiveSpace.

  • Actually ArchiveSpace is a successor to Archivists’ Toolkit
  • Support for appropriate standards
  • All archivist functionality in one place
  • Web delivered
  • Open Source
  • Lyrasis network behind development development
  • MySQL database
  • Runs under a container system such as Jetty or Tomcat
  • Code in JRuby, available on github
  • Four web apps:
    • Public front end
    • Admin area
    • Backend (Communicator)
    • Solr for indexing/search

Migration issues:

  • All data available from CMSyst
  • Exported in EAD
  • Some obstacles getting EAD to map correctly to loaded authorities
  • Some obstacles getting authorities loaded

ArchiveSpace has functionality to link  out to digital objects – using LUNA system – with CSV import of data [not clear which direction data flows here]

ArchiveSpace is popular in US, but Edinburgh the first European institution to take it.

University of Edinburgh Collections has a new web interface http://collections.ed.ac.uk. ArchiveSpace will be the expresso not archives within this collections portal. Archives are surfaced here as Collection Level Descriptions – 1000s of collections covered.

Implementation has been a good collaborative project. Got Curators, archivists, projects and innovations staff and digital developers all involved. Also good support on mailing list and good collaboration with other institutions.

Now going to look at “CollectionSpace” – which is a sister application for museums.

Archives collections will be available soon (within a couple of weeks) at http://collections.ed.ac.uk/archives

Publications Router

http://broker.edina.ac.uk

Currently takes data from UKPMC, looks for an affiliation statement in text of publication, pushes to appropriate institutional repository based on that affiliation (if the IR has given permission for this to happen).

Uses SWORD to push paper to the IR

Can also browse all the content, search by target repository and author

Also API to the data in the Router.

Currently trying to understand how the HEFCE REF OA mandate impacts on what the Router should do and how it can help institutions deliver to the mandate.

Major change is the requirement for AAM – which wasn’t the previous requirement

One of the most important question was  how institutions would like to get AAMs delivered to them – 3 options:

  • i. 3rd party service to push content to your system – 37% of respondents didn’t know if they would want this approach
  • ii. Pull content into your system using an API – 31% don’t know if this would work for them
  • iii. Receive content via email as a file attachment – 43% said ‘OK at a push’ and 30% saw as satisfactory

If there was another solution what would it be:

  • Deposit at publication
  • Academics upload
  • SWORD would be good if institution/author matching is ‘really good’
  • Anything involved minimal reliance on academics updating information
  • Publishers to provide metadata to institution on acceptance
  • Being notified of accepted manuscripts by publishers so then can coordinate with authors

Broke into discussion groups for the following questions:

  • How would you ideally like to receive AAMs (or metadata describing them)
  • If Router starts to provide AAMs (or metadata describing them) at acceptance and then later provide metadata for the published version of record (VoR) – what are the reduplication issues?
  • If you receive multiple copies of the same version what are the de-duplication issues
  • What are the main issue holding you back for participating? What help would you require?
  • What is the one most important feature that is essential to you?

Feedback from groups:

  • Pure doesn’t currently have a field for ‘date of acceptance’
  • Better to get the manuscript but the metadata is better than nothing
    • In some cases (especially with AAMs) you may have the manuscript and minimal metadata
  • De-duplication a huge issue – to the extent that examples of institutions having to turn off automated feeds from sources such as Scopus (see also Bournemouth yesterday)
  • Getting any kind of notification of acceptance would be a huge step for those working in institutions
    • Getting notification of publication as well would be big step
  • A pull method preferred – may need local processing before you publisher
  • ‘extra’ metadata – e.g. corresponding author – would be highly useful – not available from WoS
    • If local system doesn’t have ability to store that metadata then this is a problem
  • Boundary between push/pull is not very clear. E.g. notification is ‘push’
    • Got to be clear about what is being pushed and where to
    • Reluctant to have repository populated without human intervention
    • EPrints and DSpace have a ‘review queue’ function
  • Having more publishers on board with the Router is key – if you don’t have good coverage it’s just one more source
  • Identifiers! If you have identifiers for AAMs (DOIs) it might help with de-duplication
  • If you are confident as an institution that Authors will deposit AAMs then the real issue becomes very much being notified at publication [found this point very interesting – points to a real divide in institutional attitudes about what authors will do]
  • People still trying to work out workflows
  • Maybe a mixed message to researchers/authors if some is automated and some is not

Lessons in Open Access Compliance in HE (LOCH)

LOCH Pathfinder project. Presented by Dominic Tate

Partners – University of Edinburgh, Heriot-Watt, St Andrews

In (very) brief:

The approach is:

  • Managing Open Access payments – including a review of current reporting methods and creation of shareable spreadsheet templates for reporting to funders
  • Using PURE as a tool to manage Open Access compliance, verification and reporting
  • Adapting institutional workflows to pre-empt Open Access requirements and make compliance and seamless as possible for academics

 

E2E – end to end open access

Valerie McCutcheon talking about the ‘end to end open access’ (or E2E) project.

Project is working with a wide range of institutions types – big/small, geographically dispersed, from ‘vanilla’ systems to more customised.

Manifesting standard new open access metadata profile with some implementation although overall system agnostic:

  • EPrints case study
  • Hydra case study
  • EPrints OA reporting functionality

Generic workshops

  • Early stage – issue identification and solution sharing
  • Embedding future REF requirements
  • Advocacy
  • Late stage – report on ridings and identify unsolved issues

Several existing standards bodies/activity

E2E is collecting information on different metadata that is being used or is needed –  Metadata Community Collaboration – spreadsheet for people to add to

http://e2eoa.org/2014/07/01/working-on-metadata-requirements/

Workshop on 4th September – covering:

  • Current initiatives in Open Access
  • Metadata requirements

Working with other Pathfinder projects

Open Access at UCL and Bournemouth University

Quite different institutions but similarities in publications management at systems level:

  • Both use Symplectic Elements to manage publications and EPrints for IR
  • BU Research and Knowledge Exchange Office manages OA funding and ‘Bournemouth Research Information and Networking’ while the IR (BURO) is managed by the library
  • UCL Library manages both OA funding and publications through the Research Publications Service (RPS – think this is the Symplectic Elements) and Discovery (EPrints)

Publications Management

  • Researchers manage their data via Symplectic, which can also get data from Scopus and Web of Science, the data is then pushed out to profile pages and/or Repository

Institutional Repositories

  • UCL IR (Discovery) is both metadata only and full-text outputs – 317794 outputs in total – includes 5111 theses
  • Bournemouth only has full-text  – much smaller numbers – 2831 outputs in total – not all public access

Staff support

  • UCL – a Virtual Open Access Team
  • Bournemouth
    • OA Funding – 1 manager
    • No fulltime repository staff
    • Rota of 3 editorial staff, working one week in three on outputs received
    • 0.2 repository administrator
    • 0.2 Repository manager

OA Funding

  • UCL OA funding managed by OA Team in the library
    • Combination of RCUK, UCL and Wellcome funding
    • at least 9000 research pubs per annum
    • RCUK 2013-14 target: 693 papers – successfully processed 796
    • Current level of APC payments >2000 per annum

UCL has many pre-payment agreements in place for APCs

  • BioMed Central
  • Elsevier
  • BMJ Journals
  • RSC
  • IEEE
  • PeerJ
  • Sage
  • PLOS
  • Springer
  • T&F
  • Wiley
  • ubiquity press
  • and more – and hoping to extend further

Pre-payment agreements have been very successful and saved money

Both Bournemouth and UCL have found it challenging to spend all the money available for APCs

Challenges for engagement

  • UCL Discovery
    • Metadatga only outputs – poor quality, not checked, can be entered multiple times
    • Feeds into Symplectic Elements from Scopus and WoS can lead to duplicates: Scopus sometimes has records for pre and post publication and WoS can have a record also – and academics select all three rather than just choosing one of them
    • Academic engagement
    • Difficulty sending large files from RPS (Symplectic) to IR
    • Furious about how h index is calculated in RPS (manual entries aren’t counted, only items from Scopus / WoS)
    • Incorrect search settings in RPS
    • Don’t understand the data harvesting process – user managed to crash the system by entering single word search with common author name
  • Bournemouth BURO
    • 2013 – converted with full-text only
    • Mapping data issues
    • Incorrect publications display on original staff pages
    • Academic staff left thinking BURO no longer existed [think implication is that it looked liked it had been replaced by RPS?]

UCL have very clear requirement for outputs to be deposited in IR – http://www.ucl.ac.uk/library/open-access/ref/

Sheer volume of outputs at UCL is overwhelming

At Bournemouth – advocacy a big issue still (especially since many thought BURO had been discontinued) – but now outputs in BURO and BRIAN must be considered in pay and progression.

Shared challenges

  • Deposit on acceptance
  • Open Access options – making sure academics know what routes of publication are open to them
  • Establishing new workflows
  • Publishers move goalposts, change conditions etc.
  • Flexible support
  • Encouraging champions in Faculties
  • Use the REF2020 as a stick and a carrot for their research

UCL as a whole supports Green OA, but assists academics to meet their requirements through Gold OA route. UCL feels Gold will still be important to science disciplines

BU – funding will be available and has institutional support – but issues may arise depending on volume in the future

EPrints developments

Sketchy notes on this session: Les Carr updating us on developments in EPrints software / development

What is EPrints for?

  • Supporting researchers
  • Supporting research data
  • Supporting research outputs
  • Supporting research outcomes
  • Supporting research managers

However, want repositories to take on a bigger agenda – publication, data, information, …

To achieve this stripping EPrints back to its core data management engine – tuning it for speed, efficiency, scale and flexibility.

EPrints4 is not a rewrite of the software, but making the core as generic as possible – so it can handle all kinds of content

Improved integration with Xapian (for search)

Improving efficiency of information architecture – db transactions, memcached, fine-grained ACLs – can support much bigger repositories.

MVC approach

Can be run as headless service, but comes with a UI

Towards Repository ’16 – OA Compliancy, capture projects/funders data (working with Soton and Reading)

Integrating with other services/systems

  • IRUS-UK (on Bazaar)
  • Publications router
  • ORCID
  • RIOXX
  • WoK/Scopus imports
  • OA end-to-end project (EPrints Services are partners)

Les says “EPrints moving towards being a de-factor CRIS-light systems”

Repositories are/need to be collaboration between librarians and developers

No release date for EPrints4 yet – but probably around a year away.

RIOXX v2.0

Original concerns for RIOXX:

  • Primary:
    • How to represent the funder
    • How to represent the project/grant
  • Secondary:
    • How to represent the persistent identifier of the item described
    • Provisions of identifiers pointing to related data sets
    • How to represent the terms of use for an item

Original principles:

  • Purpose driven – Focussed on satisfying RCUK reporting requirements
  • Simple (re-use DC, not CERIF)
  • Generic in scope (don’t tie down to specific types o output)
  • Interoperable – specifically with OpenAIRE
  • Developed openly – public consultation

Has anything changed with RIOXX 2 (mid-2014)?

  • Still purpose driven, but no encompassing HEFCE requirements as well
  • Slightly more sophisticated / complex but still quite simple
  • No longer ‘generic’ – explicit focus on publications
  • No longer seen as a temporary measure – positioned to support REF2020
  • Interoperability still key – and currently working on an OpenAIRE crosswalk

Other changes:

Current status:

  • version 2.0 beta was released for public consultation in June 2014
  • version 2.0 RC 1 has been compiled
  • accompanying guidelines are being written
  • XSD schema beeen developed
  • expect full release in late August/early September

Some specific elements:

  • dc:identifier
    • identifies the open access item being described by the RIOXX metadata record, regardless of where it is
    • recommended to identify the resource itself, not a splash page
    • dc:identifier MUST be an HTTP URI
  • dc:relation and rioxxterms:version_of_record
    • rioxxterms:version_of_record is an HTTP URI which is a persistent identifier for the published version of a resource
    • will often be the HTTP URI form of a DOI
  • dc:relation
    • option property pointing to other material (e.g. dataset)
  • dcterms:dateAccepted
    • MUST be provided
    • more precise than other dated events (‘published’ date very grey area)
  • rioxxterms:author & rioxxterms:contributor
    • MUST be HTTP URIs – ORCID strongly recommended
    • one rioxxterms:author per author
    • rioxxterms:contributor is for parties that are not authors but credited with some contribution to publication
  • rioxxterms:project
    • joins funder and projected in one, slightly more complex, proerty
    • The use of funder IDs (DOIs in their HTTP URI form) from FundRef is recommended, but other ID schemes can be used and name can be used
  • license_ref
    • adopted from NISO’s Open Access Metadata and Indicators
    • takes an HTTP URI and a start date
    • URI should identify a license
      • there is work under way to create a ‘white list’ of acceptable licenses
    • embargoes can be expressed by using the ‘start date’ to show the date on which the license takes effect

Funding of development of RIOXX as application profile now at an end, but funding for further developments (e.g. s/w development for repositories etc.)

RIOXX is endorsed by both RCUK and HEFCE

Q: What about implementing RIOXX in CRIS systems?

A: Some work on mapping between CERIF and RIOXX, although not ongoing work. Technical description available for any one to implement.

A (Balviar): In terms of developing plugins for commercial products – not talked to commercial suppliers yet, but planning to look at what can be developed and conversations now starting.

A (James): Already got RIOXX terms in Pure feed at Edinburgh

Q: Can you clarify ‘first name author’ vs ‘corresponding author’ – do you intend the first named author to be the corresponding author?

A: Understand that the ‘common case’ is that the first name author is the corresponding author [at this point lots of disagreement from the floor on this point].

‘first name author’ seen as synonym for ‘lead author’

Q: Why is vocabulary for (?) not in line with REF vocabulary

A: HEFCE accepts wider range of outputs that ‘publications’, but RIOXX specifically focusses on publications  – where most OA issues lie

Q: ‘Date accepted’ what about historic publications? Won’t have date of acceptance

A: RIOXX not designed for retrospective publication – going forward only. Not a general purpose bibliographic record

Comment from Peter Burnhill: RIOXX is not a cataloguing schema – it is a set of labels

Paul W emphasises – RIOXX is not a ‘record’ format – the systems outputting RIOXX will have much richer metadata already. There is no point in ‘subverting’ RIOXX for historical purposes – this isn’t its intended purpose.

Q: Can an author have multiple IDs in RIOXX

A: Not at the moment. Mapping between IDs for authors is a different problem space, not one that RIOXX tries to address

Comment from RCUK: Biggest problem is monitoring compliance with our policies – which RIOXX will help with a lot

Comments from floor: starting to see institutions issues ORCIDs for their researchers – could see multiple ORCIDs from a single person. Similarly with DOIs – upload a publishers PDF to Figshare you get a new DOI

Q: If you are producing RIOXX what about OpenAIRE

A: There is nothing in RIOXX that would stop it being OpenAIRE compliant – so RIOXX records can be transformed into OpenAIRE records (but no vice versa)