Working with Data using OpenRefine

Over the last couple of years, the British Library have been runningĀ a set of internal courses on digital skills for librarians. As part of this programme I’ve delivered a course called “Working with Data”, and thought it would be good to share the course structure and materials in case they were helpful to others. The course was designed to run in a 6 hour day, including two 15 minute coffee breaks and a one hour lunch break. The focus of the day is very much using OpenRefine to work with data, with a very brief consideration of other tools and their strengths and weaknesses towards the end of the day.

Participants are asked to bring ‘messy data’ from their own work to the day, and one session focusses on looking at this data with the instructor, working out how OpenRefine, or other tools, might be used to work with the data.

The materials for this course are contained in three documents:

  1. A slide deck: “Working with Data using OpenRefine” (pdf)
  2. A handout: “Introduction to OpenRefine handout CC-BY” (pdf)
  3. A sample date file generated from https://github.com/BL-Labs/imagedirectory/blob/master/book_metadata.json (csv)

The slides contain introductory material, introducing OpenRefine, describing what type of tasks it is good for, and introducing various pieces of functionality. At specific points in the slide deck there is an indication that it is time to do ‘hands-on’ which references exercises in the handout.

The schedule for the day is as follows:

Introduction and Aims for the day (15 mins)

Session 1 (45 mins)

  • Install Open Refine
  • Create your first Open Refine project (using provided data)
  • Basic Refine functions part I
    • Columns and Rows
      • Sorting by columns
      • Re-order columns
    • Using Facets Part I
      • Text facets
      • Custom facets

Break (15 mins)

Session 2 (45 mins)

  • Basic Refine functions part II
    • Refine data types
    • Using Facets Part II
      • Clustering facets
    • Transformations
      • Common transformations
      • Using Undo/Redo
      • Write transformations with ‘GREL’ (Refine Expression Language)

Lunch (60 mins)

Session 3 (60 mins)

  • – What data have you brought along?
    • Size
    • Data
    • How can Refine help?
    • What other tools might you use (e.g. Excel,…)
  • Your data and Refine
    • Can you load your data into Refine
    • Handson plus roaming support

Break (15 mins)

Hour 4 (60 mins)

  • Advanced Refine
    • Look up data online
    • Look up data across Refine projects
    • Refine Extensions and Reconciliation services
  • Handson plus roaming support

Round-up (15 mins)

The materials linked above were developed by me (Owen Stephens, owen@ostephens.com) on behalf of the British Library. Unless otherwise stated, all images, audio or video content included in these materials are separate works with their own licence, and should not be assumed to be CC-BY in their own right.

The materials are licensed under a Creative Commons Attribution 4.0 International LicenseĀ http://creativecommons.org/licenses/by/4.0/. It is suggested when crediting this work, you include the phrase Developed by Owen Stephens on behalf of the British Library.

2 thoughts on “Working with Data using OpenRefine

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.