|
Feb
11
|
Recently Chris Keene (University of Sussex) sent an email to the LIS-E-RESOURCES email list about the fact that in academic libraries we are now doing a lot more ‘import’ and ‘export’ of records in our library management systems – bringing in bibliographic records from a variety of sources like book vendors/suppliers, e-resource systems, institutional repositories. He was looking for some shared experience and how other sites coped.
One of the responses mentioned the new ‘next generation’ search systems that some libraries have invested in, and Chris said:
“Next gen catalogues are – I think – certainly part of the solution, but only when you just want to make the records available via your local web interface.”
One of the points he made was that the University of Sussex provides records from their library management system to others to allow Union catalogues to be built – e.g. InforM25, COPAC, Suncat.
I sympathise with Chris, but I can’t help but think this is the point at which we have to start doing things a bit differently – so I wrote a response to the list, but thought that I’d blog a version of it as well:
I agree that library systems could usefully support much better bulk processing tools (although there are some good external tools like MarcEdit of course – and, scripting/programming tools (e.g. the MARC Perl module) if you have people who can programme them. However, I'd suggest that we need to change the way with think about recording and distributing information about our resources, especially in the light of investment in separate 'search' products such as Aquabrowser, Primo, Encore, Endeca, &c. &c.
If we consider the whole workflow here, it seems to me that as soon as you have a separate search interface the role of the 'library system' needs to be questioned – what are you using it for, and why? I'm not sure funnelling resources into it so they can then be exported to another system is really very sensible (although I absolutely understand why you end up doing it).
I think that once you are pushing stuff into Aquabrowser (taking Sussex as an example) there is little point in also pushing them into the catalogue – what extra value does this add? For books (print or electronic) you may continue to order them via the library system – but you only need an order record in there, not anything more substantial – you can put the 'substantial' record into Aquabrowser. The library system web interface will still handle item level information and actions (reservations/holds etc.) – but again, you don't need a substantial bib record for these to work – the user has done the 'searching' in the search system.
For the ejournals you could push directly from SFX into Aquabrowser – why push via the library system? Similarly for repositories – it really is just creating work to covert these into MARC (probably from DC) to get them into your library system, to then export for Aquabrowser (which seems to speak OAI anyway).
One of your issues is that you still need to put stuff into your library system, as this feeds other places – for example at Imperial we send our records to CURL/COPAC as well as other places – but this is a poor argument going forward – how long before we see COPAC change the way it works to take advantage of different search technology (MIMAS have just licensed the Autonomy search product …). Anyway – we need to work with those consuming our records to work out more sensible solutions in the current environment.
I'd suggest what we really need to think about is a common 'publication' platform – a way of all of our systems outputting records in a way that can then be easily accessed by a variety of search products – whether our own local ones, remote union ones, or even ones run by individual users. I'd go further and argue that platform already exists – it is the web! If each of your systems published each record as a 'web page' (either containing structured data, or even serving an alternative version of the record depending on whether a human or machine is asking for the resource – as described in Cool URIs), then other systems could consume this to build search indexes – and you've always got Google of course… I note that Aquabrowser supports web crawling – could it cope with some extra structured data in the web pages (e.g. RDFa)?
I have to admit that I may be over estimating how simple this would be – but it definitely seems to me this is the way to go – we need to adapt our systems to work with the web, and we need to start now.
One Ping to “Its time to change library systems”
11 Responses to “Its time to change library systems”
-
1. Chris Keene Says:
February 12th, 2009 at 1:07 pmHi Owen
I’ve now put my original email online for reference and added some additional thoughts as well:
http://www.nostuff.org/words/2009/library_catalogues_changing_model/
Short url
http://is.gd/jhqo
For us, we started trying to import e-journal records before we had Aquabrowser. We first tried the quick approach, take a huge file of e-journal MARC records and import them in to our LMS (i.e SFX -> Talis). However, problems appeared (duplicates, links not working, odd display, missing journals) which was a combination of the two systems and our inexperience with the process. So we started again, much slower, and most the problems went away.
We were so focussed on this, that we didn’t really stop when we got Aquabrowser to think about different models. But like you say, bypassing the catalogue has advantages and is in many ways preferable. It’s a cleaner solution and removes the issues of keeping the data in sync.
When describing the new issues that are arising of the pros and cons of where the records of particular types of items should live, I used COPAC/World-cat as an example of one issue, i.e. third party systems that expect the LMS catalogue to be all our holdings. These were just examples, and I want to stress the point there are many places where we and others refer to ‘the catalogue’ (and by which we mean the source of all our items bib data) where in future we might have to consider what exactly we mean. Other examples include Endnote (which can search library ‘catalogies’) and Link Resolvers, which can use the catalogue as a ‘source’ (i.e. a final destination where you can find the item your are after). Plus in this new open world, there may well be services using our LMS catalogue (perhaps via Z39.50) which we don’t even know about.
If we change what we do and do not put in the catalogue how will it affect these services?
To be clear, I’ not saying this is a bad thing, in many cases it will probably be a good thing (many third party systems probably don’t want to include online content that only your users have access to), but just something we need to consider.
Anyway, some good points here, and I’ll certainly be giving them some thought (once I’ve had lots of coffee!). Thanks for the ideas. I especially like the concept of a common ‘publication’ platform.
Cheers
Chris -
2. Rachel Heery Says:
February 12th, 2009 at 4:06 pmOwen, I think this is spot on! Library catalogues need to be part of the Web – as do repositories which seem to get more discussion space, just library catalogues have much more in them and could be exploited now on the Web.
I think one needs to consider Web friendly models for creating catalogue records as well as for storing them and aggregating holdings etc
This brings to mind recent discussion about changes to WorldCat policies on sharing records derived from WorldCat, as well as discussions on aligning repositories with the Web. -
3. Ian Says:
February 16th, 2009 at 1:58 pmOwen,
For me, I’d read XML rather than ‘the web’ as a common publication platform – a commonly agreed data structure defined in XML would avoid all those problems you have when your web pages get full of tags for different scripting languages, and from the different descriptors that libraries use for data elements. Then you combine this with web services for data interchange (avoiding the latency issues with daily data exports to Aquabrowser) and suddenly it no longer matters where the data is held.
However all this demands real advocacy to the system suppliers who are currently deriving benefits from having us caught up in technical islands. There’s only so much development that the few of us can do, and many libraries are now operating without someone on the staff who can see the need for change. Along with a substantial change in business perspective there’s a real market opportunity for the first suppliers to realise these opportunities. Only then will our users get that single search box that they are looking for…… -
4. Owen Stephens Says:
February 16th, 2009 at 2:55 pmHi Ian,
You raise so many things here that I need another few posts to respond! Here are some responses quickly that need more exploration. I think this all makes sense – but I’m also prepared to admit that I sometimes have slightly odd views
XML isn’t a platform in the sense I mean here. Libraries have more than one data structure available to them that can be expressed in XML, but by itself this is not sufficient – although I’m a fan of moving from MARC or indeed MARCXML to something that is a bit more useful and consumable. (Note that there are approaches to allowing both human and machine-readable representations of data at the same URI – the Cool URIs document I reference in the post describes this, and I also mention the idea of RDFa to embed structured data in an html page giving the user a human-readable display, while including a lot of structured data that can be exploited by software)
The notion of what I mean by the ‘web’ as a common publication platform could do with some expansion here, but there are some fundamental concepts – like the ability to link between documents, using http, expressing information as human readable html (as well as structured data) that I think are a minimum requirement for libraries to exploit the web properly.
I’m very nervous about a ‘commonly agreed data structure’ – I guess this is controversial for libraries, but I simply don’t believe that this is achievable, because I don’t believe the generation of useful (meta)data is restricted to libraries. We can look at how successful (for example) the uptake of even simple DC has been – how many web pages have decent structured metadata embedded in them? I’m not against the idea of some level of commonality – and in some scenarios we may even achieve a useful degree of consistency (for example in libraries – although if you look at data consistency across libraries it is relatively poor even with a lot of commonly agreed rules).
I’m not in any way against communities having common standards – I just think we need to assume a lack of consistency as a starting point. This is where I come back to the strength of the web as a platform – if we look at the web, it is an incredibly inconsistent hodge-podge of information, and yet because of the links between documents it is possible to make some kind of sense of it – which is why Google and others can return relevant hits (I’m not saying this approach is perfect, but it is very successful and definitely serves a need).
I have a feeling I need to argue this out in another post – but see my previous post ‘The Future is Analogue’ to see how I think links are key to thinking about data, metadata and information discovery. I also believe that networks of information naturally minimise (without eradicating) duplication. At the level of data structures, I believe we need to embrace the complexity in the system, rather than try to design it out – and in the end a successful approach to this would allow people (users, librarians, search engines) to bring together information that serves them or their community best.
I think (hope?) that I’m arguing for something more radical than you suggest – and yes, we need people to develop this with us, although it may not be traditional library system suppliers – although I’d say that out of the ones I have worked with both Talis and Ex Libris show some understanding of the need for change here. -
5. Rosemie Says:
April 12th, 2009 at 2:45 pmHet kan niet beter gezegd en gepresenteerd worden als wat ik hierboven lees en zie! Zet Web3.0 ook nog maar als tag bij je artikel..
En als het mag ga ik jouw schema (met naamvermelding natuurlijk) gebruiken bij iets wat ik dit jaar op Elag wil tonen.This comment was originally posted on CommonPlace.Net
-
6. Jeroen Hoppenbrouwers Says:
April 12th, 2009 at 7:06 pmNot too long ago I made another blog specifically about MACS: http://www.hoppie.nl/pub/node/89
I intend to shortly create non-login permalinks on the LMI site that allow external web sites (or browsers) to directly fetch relevant linking information from any authority number. As soon as the actual authorities (RAMEAU, LCSH, SWD…) formally publish static URLs for all their subjects (and some already do), these will be added as well. The result should be a linking resource that can be simply integrated into nearly anything.
Which format the XML or HTML under the URL will be, still needs to be decided. Simple RDF sounds okay, but SKOS is another possibility. Plus, of course, some human-readable stuff… plenty of options here.
Jeroen
This comment was originally posted on CommonPlace.Net
-
7. Andy Ekins Says:
April 12th, 2009 at 8:05 pmSorry if I waffle…red wine can do that!
Some time ago when I heard that ExLibris were to start using Oracle 10g I did wonder at the time if this would be the catalyst for some kind of ‘grid’ initiative. By this I mean; develop a system whereby institutions would share data in a grid model rather than replicate it over and over again. Unfortunately, this wasn’t the case (as yet) and we are still in a position of replicating data in every organization with all the idiosyncrasies and erroneous entries this entails.
So the concept of a single authoritative source of bib info for every publication is very interesting and seems very logical. This system seems to have all the benefits of the grid model above, but also incorporates the concepts that embodies the semantic web. So what I believe you are saying with your diagram is that you are separating the bib record part of the LMS from the circulation and holding part. The institution would control circulation and holdings info, but get it’s bib info from the cloud. It does seem like a logical model, but I have a couple of questions:
Who would be the author of the single web pages? the publisher? the vendor? the author? a consortium? a private enterprise?…and who would ultimately be responsible for the integrity of the data? At one of the Q & A sessions at the JISC ‘Libraries of the Future’ conference (LOTF09) there was a discussion not too dissimilar to this. One of the presenters said that he would be extremely wary of handing over control of library data to an organisation (such as Google) as their agenda was different to the library’s agenda. My fear would be that whoever controlled the data would end up manipulating it for its own purposes.
The other issues is how the link is made between the holdings and circulation data which must be held locally and the bib data in the cloud. Is the idea that, when the bib information is needed the local system would search for it in the authoritative database out in the cloud or would this information be harvested on a regular basis like the Primo model? If the former, then what would happen when the internet connection was down or an authoritative source was unknown or unreachable? And if the latter could you see applications like Primo being developed to incorporate a system like this?
One last question…do you think that current (or even future…URM?) LMS systems could cope with this model? Or would libraries need to purchase/develop new systems?
Great post by the way
This comment was originally posted on CommonPlace.Net
-
8. Lukas Koster Says:
April 14th, 2009 at 7:31 amAndy, yes this idea is about separating bibliographic data from local transaction data.
It is still a very conceptual idea, your good points touch upon the practical implementations.
- Who would be the author of the web pages: well this could be anyone! Of course there would need to be some kind of authoritative control on different levels, but I can’t tell how this will turn out. I could think of international library cooperation, together with individual authors, publishers, etc.
- Link to global data: again: I guess this can be done in various ways. The whole ideas is of course that global data (in the “cloud”) would prevent everyone from duplicating these data in local systems. “Internet connection down” is what currently is already a risk for lots of systems that we use.
- Which systems: I have no idea. Libraries or vendors should enable their systems to link to URL’s in order to use and present data from these URL’s for their own staff and end usersThis comment was originally posted on CommonPlace.Net
-
9. Owen Stephens Says:
April 15th, 2009 at 1:07 pmNice post
One of the ideas I haven’t yet managed to get into a blog post is that I don’t really believe in a single unique webpage per book/author/subject. I’m not clear from this post if you are arguing we should be trying for this or not? I haven’t managed to get my own thoughts organised enought to do my own blog post – but this seems like a good opportunity to try out some of my thinking…
What I mean is that having several different URIs for David Mitchell is OK – what a library would have to do is decide which one(s) it wanted to use in its local representation. If VIAF presents David Mitchell well then point to that. However, if there are better representations of other authors elsewhere, you can use alternative sources to link to for those authors. We can’t (and wouldn’t want to) stop anyone publishing a web page representing ‘David Mitchell’ in some way – what we need to do is start embracing this. Although this sounds like I’m promoting a chaotic approach (and to some extent I am!) the truth is that we would quickly see key URIs emerging – most libraries would choose to link to the same sources of information for a specific entity (work/author/etc.) – giving them lots on inbound links, and so impacting on relevance ranking in Google etc.
Also remember that the web is a network of links – so there is nothing to stop LibraryThing linking to VIAF and VIAF linking to the wikipedia entry (incidentally I’d suggest for many well-known authors their wikipedia entry has more useful information than ‘library’ focussed pages). The type of analysis you have done here is interesting, as it starts to show the kind of thinking you might do when deciding which to link to – but my contention is that you don’t need it to be the same place each time.
Also, if you link to one URI for an author, and another library links to a different one, but you both link to the same URI for the related Works then it would be possible to start inferring some kind of equivalence for the author URIs. You could even make an explicit link to say ‘these are the same entity’ if it was valuable (and again, the more people who did this, the more you could believe it)
In terms of putting together a searchable index – if we start using the web properly we can start using crawling techniques to build our indexes. Your local information will seed the crawler – i.e. tell it where to start crawling, and you can tell it how ‘deep’ to go on the web – if you are just interested in the URIs to link to directly from your local information, then you can tell it to ignore any further links.
You could also decide how far you go in terms of caching what you crawl. If you want resilience against Internet connectivity failure (as Andy suggests you might) you could cache everything and keep local copies (I’m not convinced you’d want to, but it is a possible approach).
Something you would need to accept is that the information you crawl may change and be updated – and that you don’t have control. This is probably the most difficult thing for libraries to deal with – and as you suggest perhaps makes the decision of who you link to for various bits of metadata a key question.
There are issues as well – what if a URI you linked to disappears? How would you know? What would you do? These are issues that need some further thought, but I’m convinced they are surmountable (although I’d say we have to be careful not to invent new library specific things when tackling these issues – that feels a bit like OAI-PMH, which has not been well adopted outside the library/repository world). There are probably other issues that I haven’t mentioned/thought of – but at heart my argument is – the web works well, lets start using it properly!
This comment was originally posted on CommonPlace.Net
-
10. Lukas Koster Says:
April 19th, 2009 at 4:44 pmI think all your comments have to do with one single issue: who has control over quality? My initial “ideal” picture was: one single point of definition for each object. That’s my “normalised datamodel designer’s” hangup, maybe.
This would require some kind of authority control as I suggest. But the nature of the web is completely different, as you observe. Chaotic. But the idea of emerging key URI’s is not unrealistic. This would constitute some kind of “authority of the masses”?
Would there be a role for international consortia (commercial and non-commercial organisations) in monitoring quality?
Anyway, I like your motto “the web works well, lets start using it properly!”This comment was originally posted on CommonPlace.Net
-
11. Owen Stephens Says:
April 24th, 2009 at 11:51 amAs you say, emerging key URIs would be about “authority of the masses” – the more libraries that link to a record, the more you would accept that this was an ‘authoritative record’.
I argue in my ‘Future is Analogue’ post http://www.meanboyfriend.com/overdue_ideas/2009/02/the-future-is-analog.html that we need to think more about spectrums of ‘aboutness’ – and I would say the same with authority about a ‘quality’ cataloguing record – across libraries you won’t find a single answer to what is the ‘right’ catalogue record – but some will be used more than others.
If we did have linked bibliographic records in the way I describe I think we would actually find that libraries tended to use ‘authoritative’ sources to link to anyway – just in the same way that most libraries do copy cataloguing from a few, well known, sources. So, we might expect (for example) National Library catalogues to become a focus for large numbers of incoming links. We might well see consortia taking this role as well – or at least the idea of ‘trusted partners’ within consortia (i.e. we know that x catalogues to standards we are happy with and meet our needs)
This comment was originally posted on CommonPlace.Net
Leave a Reply
Additional comments powered by BackType
May 29th, 2009 at 10:40 pm
[...] and “every book its own url“, as described by Owen Stephens in two blog posts: “Its time to change library systems [...]