|
Feb
02
|
"Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.
Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea."
Douglas Adams, The Hitchhiker's Guide to the Galaxy
Digital Libraries, Digital Repositories, Born Digital, Digital Objects – the idea of digital information has become an intrinsic part of the library landscape in the 21st century. However, I believe that as we manage more information in digital formats, we need to think about managing it in analogue, rather than digital, ways.
What do I mean by 'digital' and 'analogue' in this context? Well – to be clear, I'm in favour of using computers to help manage our data – in fact, I think this is key to our ability to take an 'analogue' approach!
Digital values are absolute – something is either on or off, 1 or 0, black or white. Analogue values live along a continuous scale – from black to white and all the shades of grey in between. Computers store information as a series of bits – which can either be on or off – there is no grey here, a bit is either on (1) or off (0) – they are literally digital.
When dealing with physical items on a shelf, and entries in a printed or card catalogue, it is difficult to do anything but take a digital approach to managing your library – something is either on this shelf, or that shelf; on this card or that card; about this subject or about that subject.
Even now we don't rely on printed/card catalogues, and many items are available in electronic, rather than physical, format, we are still managing our collections in this 'digital' way. We treat all information in our catalogues as 'absolute' – from titles to subject headings.
I've heard Tim Spalding of LibraryThing talk about this in terms of subject headings – he said 'somebody wins' when you assign subject headings in a traditional library catalogue.
Even questions of fact, which you'd generally expect to have a single answer may not be entirely 'digital' (right or wrong). The classic example used in library school for reference questions is 'how high is Mount Everest?' – if you check several reference works you may come up with several answers – Wikipedia covers some of the various answers and why they are different.
At this point you may be wondering what the alternative is – you've still got to allocate a subject heading at some point (assign a title, author etc.) – right? Well, I think the answer in one of the most effective mechanisms for storing and retrieving information we've got – the web.
What makes the web 'analogue' rather than 'digital' in the way I'm using the terms is the link. We can see this clearly in the way Google was originally designed to work. In "The Anatomy of a Large-Scale Hypertextual Web Search Engine" Sergey Brin and Larry Page describe how Google was designed to make use "of both link structure and anchor text".
As is well known, Google uses the concept of the 'Page Rank', which is calculated based on the links between pages, but as illustrated by this diagram, it isn't a straightforward count of the number of links to a specific page, but allows for different weights to be assigned to the links
You can see that E has many more links than C, but does not get such a high page rank as it is, in turn, not linked to by any high ranking pages.
The Page Rank gives some kind of 'authority' to a page, but then there is the question of what the page is actually about. This latter question is not simple, but one factor that Brin and Page were explicit about is that "The text of links is treated in a special way in our search engine … we associate it with the page the link points to"
This means that not only is each link a 'vote' for a page in terms of page rank, but that it is also a piece of metadata about the page it is linked to. If you look at all the text of each link used, you are bound to get a wide range of text – as different people will link to a page from different perspectives – using different terminology and even different languages.
Suddenly here we are thinking about a way of classifying a document (web page) that allows many, many people to participate – in fact, as many people as want to – the architecture of the web puts not limit on the number of links that can be supported of course.
Alongside each assertion of a description also has a weight associated with it – so some pieces of metadata can be seen as having 'more weight' than others.
This allows for a much more analog measurement of what a document is 'about'. A document can be 'about' many things, but to different extents. This brings us back to the way tags work in LibraryThing – many people can allocate different tags to the same book, and this allows a much more complex representation of 'aboutness'.
I don't think that this just applies to 'aboutness'. I believe other pieces of metadata could also benefit from an analogue approach – but I think I'm going to have to save this argument for another post.
The key thing here (for me) is that exploiting this linking and the network built using them is something that already exists – it is the web – and with it this brings a way of breaking out of our 'digital' approach to library data, that card or printed catalogues had to adopt by their very nature.
If every book in your catalogue had it's own URL – essentially it's own address on your web, you would have, in a single step, enabled anyone in the world to add metadata to the book – without making any changes to the record in your catalogue. I'd go further than this – but again that's going to need a post of its own – I hope I manage to get these written!
So, we have the means of enabling a much more sophisticated ('analogue') approach to metadata, and what is frustrating is that we have not yet realised this, and we still think 'digital data' is a 'pretty neat idea'.
One Ping to “The Future is Analogue”
10 Responses to “The Future is Analogue”
-
1. Michael Says:
February 11th, 2009 at 3:53 pm>>> If every book in your catalogue had it’s own URL…
Erm, I think this is happening. Can’t remember the website, but I know someone started doing this (crazy project) a while back now!
>>> you would have, in a single step, enabled anyone in the world to add metadata to the book
Anyone in the world? Not by a long shot. Technology doesn’t magically fall into everyone’s lap just because we have more than we need here in the West ;p -
2. Owen Stephens Says:
February 11th, 2009 at 4:11 pmThe OpenLibrary (http://openlibrary.org/) has the aim of “One web page for every book ever published.” – and is doing a pretty good job of going in this direction.
This is definitely a step forward, but I don’t think it is enough for libraries to think ‘oh, someone is doing this’ – we either need to be doing it with them or independently. My concern is that in general libraries have not shown that they understand the web, or how to ‘put things on the web’ – it isn’t enough to just make a web interface to a catalogue (something that I saw recently described as ” just a telnet search with some CSS and has no extra benefits” – see http://dev8d.jiscinvolve.org/2009/02/10/uber-users-tom-morris-and-mike-green/#more-24)
“Technology doesn’t magically fall into everyone’s lap” – you are absolutely right – of course I meant anyone who has a web page. -
3. Rosemie Says:
April 12th, 2009 at 2:45 pmHet kan niet beter gezegd en gepresenteerd worden als wat ik hierboven lees en zie! Zet Web3.0 ook nog maar als tag bij je artikel..
En als het mag ga ik jouw schema (met naamvermelding natuurlijk) gebruiken bij iets wat ik dit jaar op Elag wil tonen.This comment was originally posted on CommonPlace.Net
-
4. Jeroen Hoppenbrouwers Says:
April 12th, 2009 at 7:06 pmNot too long ago I made another blog specifically about MACS: http://www.hoppie.nl/pub/node/89
I intend to shortly create non-login permalinks on the LMI site that allow external web sites (or browsers) to directly fetch relevant linking information from any authority number. As soon as the actual authorities (RAMEAU, LCSH, SWD…) formally publish static URLs for all their subjects (and some already do), these will be added as well. The result should be a linking resource that can be simply integrated into nearly anything.
Which format the XML or HTML under the URL will be, still needs to be decided. Simple RDF sounds okay, but SKOS is another possibility. Plus, of course, some human-readable stuff… plenty of options here.
Jeroen
This comment was originally posted on CommonPlace.Net
-
5. Andy Ekins Says:
April 12th, 2009 at 8:05 pmSorry if I waffle…red wine can do that!
Some time ago when I heard that ExLibris were to start using Oracle 10g I did wonder at the time if this would be the catalyst for some kind of ‘grid’ initiative. By this I mean; develop a system whereby institutions would share data in a grid model rather than replicate it over and over again. Unfortunately, this wasn’t the case (as yet) and we are still in a position of replicating data in every organization with all the idiosyncrasies and erroneous entries this entails.
So the concept of a single authoritative source of bib info for every publication is very interesting and seems very logical. This system seems to have all the benefits of the grid model above, but also incorporates the concepts that embodies the semantic web. So what I believe you are saying with your diagram is that you are separating the bib record part of the LMS from the circulation and holding part. The institution would control circulation and holdings info, but get it’s bib info from the cloud. It does seem like a logical model, but I have a couple of questions:
Who would be the author of the single web pages? the publisher? the vendor? the author? a consortium? a private enterprise?…and who would ultimately be responsible for the integrity of the data? At one of the Q & A sessions at the JISC ‘Libraries of the Future’ conference (LOTF09) there was a discussion not too dissimilar to this. One of the presenters said that he would be extremely wary of handing over control of library data to an organisation (such as Google) as their agenda was different to the library’s agenda. My fear would be that whoever controlled the data would end up manipulating it for its own purposes.
The other issues is how the link is made between the holdings and circulation data which must be held locally and the bib data in the cloud. Is the idea that, when the bib information is needed the local system would search for it in the authoritative database out in the cloud or would this information be harvested on a regular basis like the Primo model? If the former, then what would happen when the internet connection was down or an authoritative source was unknown or unreachable? And if the latter could you see applications like Primo being developed to incorporate a system like this?
One last question…do you think that current (or even future…URM?) LMS systems could cope with this model? Or would libraries need to purchase/develop new systems?
Great post by the way
This comment was originally posted on CommonPlace.Net
-
6. Lukas Koster Says:
April 13th, 2009 at 11:29 pmSee my post http://commonplace.net/2009/04/umr-unified-metadata-resources
-
7. Lukas Koster Says:
April 14th, 2009 at 7:31 amAndy, yes this idea is about separating bibliographic data from local transaction data.
It is still a very conceptual idea, your good points touch upon the practical implementations.
- Who would be the author of the web pages: well this could be anyone! Of course there would need to be some kind of authoritative control on different levels, but I can’t tell how this will turn out. I could think of international library cooperation, together with individual authors, publishers, etc.
- Link to global data: again: I guess this can be done in various ways. The whole ideas is of course that global data (in the “cloud”) would prevent everyone from duplicating these data in local systems. “Internet connection down” is what currently is already a risk for lots of systems that we use.
- Which systems: I have no idea. Libraries or vendors should enable their systems to link to URL’s in order to use and present data from these URL’s for their own staff and end usersThis comment was originally posted on CommonPlace.Net
-
8. Owen Stephens Says:
April 15th, 2009 at 1:07 pmNice post
One of the ideas I haven’t yet managed to get into a blog post is that I don’t really believe in a single unique webpage per book/author/subject. I’m not clear from this post if you are arguing we should be trying for this or not? I haven’t managed to get my own thoughts organised enought to do my own blog post – but this seems like a good opportunity to try out some of my thinking…
What I mean is that having several different URIs for David Mitchell is OK – what a library would have to do is decide which one(s) it wanted to use in its local representation. If VIAF presents David Mitchell well then point to that. However, if there are better representations of other authors elsewhere, you can use alternative sources to link to for those authors. We can’t (and wouldn’t want to) stop anyone publishing a web page representing ‘David Mitchell’ in some way – what we need to do is start embracing this. Although this sounds like I’m promoting a chaotic approach (and to some extent I am!) the truth is that we would quickly see key URIs emerging – most libraries would choose to link to the same sources of information for a specific entity (work/author/etc.) – giving them lots on inbound links, and so impacting on relevance ranking in Google etc.
Also remember that the web is a network of links – so there is nothing to stop LibraryThing linking to VIAF and VIAF linking to the wikipedia entry (incidentally I’d suggest for many well-known authors their wikipedia entry has more useful information than ‘library’ focussed pages). The type of analysis you have done here is interesting, as it starts to show the kind of thinking you might do when deciding which to link to – but my contention is that you don’t need it to be the same place each time.
Also, if you link to one URI for an author, and another library links to a different one, but you both link to the same URI for the related Works then it would be possible to start inferring some kind of equivalence for the author URIs. You could even make an explicit link to say ‘these are the same entity’ if it was valuable (and again, the more people who did this, the more you could believe it)
In terms of putting together a searchable index – if we start using the web properly we can start using crawling techniques to build our indexes. Your local information will seed the crawler – i.e. tell it where to start crawling, and you can tell it how ‘deep’ to go on the web – if you are just interested in the URIs to link to directly from your local information, then you can tell it to ignore any further links.
You could also decide how far you go in terms of caching what you crawl. If you want resilience against Internet connectivity failure (as Andy suggests you might) you could cache everything and keep local copies (I’m not convinced you’d want to, but it is a possible approach).
Something you would need to accept is that the information you crawl may change and be updated – and that you don’t have control. This is probably the most difficult thing for libraries to deal with – and as you suggest perhaps makes the decision of who you link to for various bits of metadata a key question.
There are issues as well – what if a URI you linked to disappears? How would you know? What would you do? These are issues that need some further thought, but I’m convinced they are surmountable (although I’d say we have to be careful not to invent new library specific things when tackling these issues – that feels a bit like OAI-PMH, which has not been well adopted outside the library/repository world). There are probably other issues that I haven’t mentioned/thought of – but at heart my argument is – the web works well, lets start using it properly!
This comment was originally posted on CommonPlace.Net
-
9. Lukas Koster Says:
April 19th, 2009 at 4:44 pmI think all your comments have to do with one single issue: who has control over quality? My initial “ideal” picture was: one single point of definition for each object. That’s my “normalised datamodel designer’s” hangup, maybe.
This would require some kind of authority control as I suggest. But the nature of the web is completely different, as you observe. Chaotic. But the idea of emerging key URI’s is not unrealistic. This would constitute some kind of “authority of the masses”?
Would there be a role for international consortia (commercial and non-commercial organisations) in monitoring quality?
Anyway, I like your motto “the web works well, lets start using it properly!”This comment was originally posted on CommonPlace.Net
-
10. Owen Stephens Says:
April 24th, 2009 at 11:51 amAs you say, emerging key URIs would be about “authority of the masses” – the more libraries that link to a record, the more you would accept that this was an ‘authoritative record’.
I argue in my ‘Future is Analogue’ post http://www.meanboyfriend.com/overdue_ideas/2009/02/the-future-is-analog.html that we need to think more about spectrums of ‘aboutness’ – and I would say the same with authority about a ‘quality’ cataloguing record – across libraries you won’t find a single answer to what is the ‘right’ catalogue record – but some will be used more than others.
If we did have linked bibliographic records in the way I describe I think we would actually find that libraries tended to use ‘authoritative’ sources to link to anyway – just in the same way that most libraries do copy cataloguing from a few, well known, sources. So, we might expect (for example) National Library catalogues to become a focus for large numbers of incoming links. We might well see consortia taking this role as well – or at least the idea of ‘trusted partners’ within consortia (i.e. we know that x catalogues to standards we are happy with and meet our needs)
This comment was originally posted on CommonPlace.Net
Leave a Reply
Additional comments powered by BackType
May 29th, 2009 at 10:43 pm
[...] “The Future is Analogue [...]