"Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.
Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea."
Douglas Adams, The Hitchhiker's Guide to the Galaxy
Digital Libraries, Digital Repositories, Born Digital, Digital Objects – the idea of digital information has become an intrinsic part of the library landscape in the 21st century. However, I believe that as we manage more information in digital formats, we need to think about managing it in analogue, rather than digital, ways.
What do I mean by 'digital' and 'analogue' in this context? Well – to be clear, I'm in favour of using computers to help manage our data – in fact, I think this is key to our ability to take an 'analogue' approach!
Digital values are absolute – something is either on or off, 1 or 0, black or white. Analogue values live along a continuous scale – from black to white and all the shades of grey in between. Computers store information as a series of bits – which can either be on or off – there is no grey here, a bit is either on (1) or off (0) – they are literally digital.
When dealing with physical items on a shelf, and entries in a printed or card catalogue, it is difficult to do anything but take a digital approach to managing your library – something is either on this shelf, or that shelf; on this card or that card; about this subject or about that subject.
Even now we don't rely on printed/card catalogues, and many items are available in electronic, rather than physical, format, we are still managing our collections in this 'digital' way. We treat all information in our catalogues as 'absolute' – from titles to subject headings.
I've heard Tim Spalding of LibraryThing talk about this in terms of subject headings – he said 'somebody wins' when you assign subject headings in a traditional library catalogue.
Even questions of fact, which you'd generally expect to have a single answer may not be entirely 'digital' (right or wrong). The classic example used in library school for reference questions is 'how high is Mount Everest?' – if you check several reference works you may come up with several answers – Wikipedia covers some of the various answers and why they are different.
At this point you may be wondering what the alternative is – you've still got to allocate a subject heading at some point (assign a title, author etc.) – right? Well, I think the answer in one of the most effective mechanisms for storing and retrieving information we've got – the web.
What makes the web 'analogue' rather than 'digital' in the way I'm using the terms is the link. We can see this clearly in the way Google was originally designed to work. In "The Anatomy of a Large-Scale Hypertextual Web Search Engine" Sergey Brin and Larry Page describe how Google was designed to make use "of both link structure and anchor text".
As is well known, Google uses the concept of the 'Page Rank', which is calculated based on the links between pages, but as illustrated by this diagram, it isn't a straightforward count of the number of links to a specific page, but allows for different weights to be assigned to the links
You can see that E has many more links than C, but does not get such a high page rank as it is, in turn, not linked to by any high ranking pages.
The Page Rank gives some kind of 'authority' to a page, but then there is the question of what the page is actually about. This latter question is not simple, but one factor that Brin and Page were explicit about is that "The text of links is treated in a special way in our search engine … we associate it with the page the link points to"
This means that not only is each link a 'vote' for a page in terms of page rank, but that it is also a piece of metadata about the page it is linked to. If you look at all the text of each link used, you are bound to get a wide range of text – as different people will link to a page from different perspectives – using different terminology and even different languages.
Suddenly here we are thinking about a way of classifying a document (web page) that allows many, many people to participate – in fact, as many people as want to – the architecture of the web puts not limit on the number of links that can be supported of course.
Alongside each assertion of a description also has a weight associated with it – so some pieces of metadata can be seen as having 'more weight' than others.
This allows for a much more analog measurement of what a document is 'about'. A document can be 'about' many things, but to different extents. This brings us back to the way tags work in LibraryThing – many people can allocate different tags to the same book, and this allows a much more complex representation of 'aboutness'.
I don't think that this just applies to 'aboutness'. I believe other pieces of metadata could also benefit from an analogue approach – but I think I'm going to have to save this argument for another post.
The key thing here (for me) is that exploiting this linking and the network built using them is something that already exists – it is the web – and with it this brings a way of breaking out of our 'digital' approach to library data, that card or printed catalogues had to adopt by their very nature.
If every book in your catalogue had it's own URL – essentially it's own address on your web, you would have, in a single step, enabled anyone in the world to add metadata to the book – without making any changes to the record in your catalogue. I'd go further than this – but again that's going to need a post of its own – I hope I manage to get these written!
So, we have the means of enabling a much more sophisticated ('analogue') approach to metadata, and what is frustrating is that we have not yet realised this, and we still think 'digital data' is a 'pretty neat idea'.