This was a really tricky one – it took me ages to work out how to approach it and even then, and with a dry run to get the answer, I made several mistakes as you can see if you take a look at the mammoth video (maybe watch this one at 2x speed?!).

This is definitely not the right kind of problem for OpenRefine to solve but it turned out it was possible – just about. I wonder if I missed a trick somewhere along the line and there could have been a simpler approach – after all the simpler question of “how big was each directory” was trivial in OpenRefine – it was the fact that we had to calculate the path to each directory based on the list of ‘change directory’ commands that made it difficult.