Teaching the Pig to Sing

This is the last session of the day by Dave Pattern (http://www.daveyp.com/blog/), the Library Systems Manager at the University of Huddersfield. Dave has been very active in doing cool stuff with their Horizon OPAC.

The first question – Does your OPAC suck – this was something that surfaced in quite a few library weblogs a little while ago. This was summed up by a posting that said ‘my opac needs more cowbell’ – you have to watch http://www.youtube.com/watch?v=EVbAuMr5eac to understand apparently.

The title of the talk refers to a quote from Roy Tennant, and one from Robert Heinlein:

"you can put lipstick on a pig, but it’s still very much a pig." (http://www.libraryjournal.com/article/CA516027.html)

"Never try to teach a pig to sing; it wastes your time and it annoys the pig." (http://thinkexist.com/quotation/never_try_to_teach_a_pig_to_sing-it_wastes_your/218581.html)

 

So, in preparation for a talk about the OPAC, Dave decided to do an online survey – he was expecting a handful of responses, but got over 700. This asked questions like ‘how happy are you with your OPAC’ – the questions are here http://www.daveyp.com/blog/stuff/opac.html and there is some initial analysis of the results here http://www.daveyp.com/blog/index.php/archives/239/

As well as this, Dave started looking at how people were using the OPAC at Huddersfield, and looked at things that he could add. One of the issues they noticed was that a large percentage of searches end with no results, and students would give up. They had already added a spell check, which definitely helped, but this dealt with zero results where they’d misspelt, not when they had used (for example) a phrase that was too specific. So they added something that looks up the search term the user has entered against ‘answers.com’ and pulls out related links from the Answers.com webpage – they call the ‘serendipity’, as they acknowledge they have no control over the terms returned.

Dave also found that the library system had been collecting data for several years about library usage, but they hadn’t been used. They started to mine the data for ‘people who borrowed this also borrowed’.

They introduced an ability to ‘rate’ a book (star rating) to see if anyone would use it. And one day, someone did. They then added the ability to comment – it hasn’t been used very much, but more by the academics than the students – they can do so anonymously (story about an academic leaving an unflattering anonymous review of a colleagues book!)

Dave started to use the xISBN service from OCLC and thingISBN from LibraryThing to link together all editions of a book owned by the library.

Used the MetaLib ‘saved search’ which it will do on a regular basis. Also added RSS feeds for results – not clear if this was using MetaLib functionality or directly from their OPAC – must ask him, as we have MetaLib but not Horizon.

One of the points that Dave makes is that they did these things in a completely speculative way – they were just trying things out and seeing if anyone used it – I really think this kind of approach (a bit like Google Labs) a great idea. They’ve found the most popular service is the spellcheck. The ‘people who borrowed this’ service wasn’t popular initially, but has increased in popularity (a 300-400% increase since they first introduced it).

However, despite all these improvements, there is a worry that all we are doing in Roy Tennant’s words are ‘putting lipstick on a pig’.

One of the issues that Dave encountered was resistance from other librarians – so had to introduce staff to the ideas, and sell the ideas. Overcome the fear that ‘sudden changes’ might confuse the users – but the changes were small, and users are used to websites making these subtle changes overtime.

  • Dave suggests we need to do the following:
  • Encourage suggestions from library staff
  • Include users in decision making process
  • Encourage play and experimentation
  • Don’t be afraid to make mistakes
  • Build crappy prototypes – a prototype is worth 1000 words

Dave is showing some of the other ideas that haven’t yet reached the light of day – a search that presents books by colour (based on the covers – possibly from Amazon?); search visualisations that show what people are searching for at the moment on the catalogue as a tag cloud; cover shots of the last 50 books borrowed from the library – some of these may not be useful ideas, but lets experiment.

Dave is mentioning the work at Ann Arbor Library; the Endeca powered catalogue at NCSU; Librarything for Libraries; ScriblioTalis Platform; Ex Libris Primo; Innovative Interfaces Encore

Dave’s view of OPAC 2.0:

  • spell checking (did you mean)
  • relevancy
  • imrpove serendipity
  • expose hidden links between items
  • APIs and Web Services to expose data

…lots more stuff – you can find quite a few slideshows at http://www.slideshare.net/daveyp/ which has these slides

In the responses to Dave’s survey, the majority of responses came from the US. When looking at the UK responses there is a gap, although the general trends are in the same direction. The biggest gap was on ‘faceted browsing’ which seems to big in the US, but not so much interest in the UK.

Teaching the Pig to Sing

This is the last session of the day by Dave Pattern (http://www.daveyp.com/blog/), the Library Systems Manager at the University of Huddersfield. Dave has been very active in doing cool stuff with their Horizon OPAC.

The first question – Does your OPAC suck – this was something that surfaced in quite a few library weblogs a little while ago. This was summed up by a posting that said ‘my opac needs more cowbell’ – you have to watch http://www.youtube.com/watch?v=EVbAuMr5eac to understand apparently.

The title of the talk refers to a quote from Roy Tennant, and one from Robert Heinlein:

"you can put lipstick on a pig, but it’s still very much a pig." (http://www.libraryjournal.com/article/CA516027.html)

"Never try to teach a pig to sing; it wastes your time and it annoys the pig." (http://thinkexist.com/quotation/never_try_to_teach_a_pig_to_sing-it_wastes_your/218581.html)

 

So, in preparation for a talk about the OPAC, Dave decided to do an online survey – he was expecting a handful of responses, but got over 700. This asked questions like ‘how happy are you with your OPAC’ – the questions are here http://www.daveyp.com/blog/stuff/opac.html and there is some initial analysis of the results here http://www.daveyp.com/blog/index.php/archives/239/

As well as this, Dave started looking at how people were using the OPAC at Huddersfield, and looked at things that he could add. One of the issues they noticed was that a large percentage of searches end with no results, and students would give up. They had already added a spell check, which definitely helped, but this dealt with zero results where they’d misspelt, not when they had used (for example) a phrase that was too specific. So they added something that looks up the search term the user has entered against ‘answers.com’ and pulls out related links from the Answers.com webpage – they call the ‘serendipity’, as they acknowledge they have no control over the terms returned.

Dave also found that the library system had been collecting data for several years about library usage, but they hadn’t been used. They started to mine the data for ‘people who borrowed this also borrowed’.

They introduced an ability to ‘rate’ a book (star rating) to see if anyone would use it. And one day, someone did. They then added the ability to comment – it hasn’t been used very much, but more by the academics than the students – they can do so anonymously (story about an academic leaving an unflattering anonymous review of a colleagues book!)

Dave started to use the xISBN service from OCLC and thingISBN from LibraryThing to link together all editions of a book owned by the library.

Used the MetaLib ‘saved search’ which it will do on a regular basis. Also added RSS feeds for results – not clear if this was using MetaLib functionality or directly from their OPAC – must ask him, as we have MetaLib but not Horizon.

One of the points that Dave makes is that they did these things in a completely speculative way – they were just trying things out and seeing if anyone used it – I really think this kind of approach (a bit like Google Labs) a great idea. They’ve found the most popular service is the spellcheck. The ‘people who borrowed this’ service wasn’t popular initially, but has increased in popularity (a 300-400% increase since they first introduced it).

However, despite all these improvements, there is a worry that all we are doing in Roy Tennant’s words are ‘putting lipstick on a pig’.

One of the issues that Dave encountered was resistance from other librarians – so had to introduce staff to the ideas, and sell the ideas. Overcome the fear that ‘sudden changes’ might confuse the users – but the changes were small, and users are used to websites making these subtle changes overtime.

  • Dave suggests we need to do the following:
  • Encourage suggestions from library staff
  • Include users in decision making process
  • Encourage play and experimentation
  • Don’t be afraid to make mistakes
  • Build crappy prototypes – a prototype is worth 1000 words

Dave is showing some of the other ideas that haven’t yet reached the light of day – a search that presents books by colour (based on the covers – possibly from Amazon?); search visualisations that show what people are searching for at the moment on the catalogue as a tag cloud; cover shots of the last 50 books borrowed from the library – some of these may not be useful ideas, but lets experiment.

Dave is mentioning the work at Ann Arbor Library; the Endeca powered catalogue at NCSU; Librarything for Libraries; ScriblioTalis Platform; Ex Libris Primo; Innovative Interfaces Encore

Dave’s view of OPAC 2.0:

  • spell checking (did you mean)
  • relevancy
  • imrpove serendipity
  • expose hidden links between items
  • APIs and Web Services to expose data

…lots more stuff – you can find quite a few slideshows at http://www.slideshare.net/daveyp/ which has these slides

In the responses to Dave’s survey, the majority of responses came from the US. When looking at the UK responses there is a gap, although the general trends are in the same direction. The biggest gap was on ‘faceted browsing’ which seems to big in the US, but not so much interest in the UK.

Guided learning, resource discovery

This session is about a ’21st Century’ approach to academic resource lists (a.k.a. Reading lists). Talis have had a product ‘Talis List’ for several years, but they are now working on ‘Project Zephyr’ for a Next Generation approach to this – I’m hoping that this session is going to cover what they are doing in this area.

It looks like this is another case study with Fiona Greig from the University of Plymouth – who I just talked to over lunch.

Starting with Chris Clarke from Talis – outlining the problems of reading/resource list management – Students who want resources, Academics who are short on time, and librarians who need to get lists, order resources, setup loan statuses, etc.

Talis list has been around for several years – can be used with any LMS, and integrates with ‘the VLE’ (that latter statement isn’t very specific). But – could be improved – better integration with LMS, ability to suggest loan strategies for items on resource lists based on usage, better workflows.

Now the case study – they have the current Talis List product. University of Plymouth have traditionally had problems getting the lists from academics. So, focussed on carefully selected academics, and got 135 active modules with 504 active lists – went live in Septemer 2007. Already hearing that students are now demanding the service from their academics.

Student expectations:

  • Full-text available straightaway
  • They want control – how they use resources
  • They want them remotely and on the move
  • They share resources
  • They use Multimedia – academics creating multimedia objects – move away from ‘essential reading’ to ‘essential resources’
  • They want to know when they should read something (e.g. which week of the course) – alerts to tell them

Emphasis from Fiona that it has to be driven by the students – they are the main users. She suggests that you have to bring the academics along with you, but you can’t simply listen to what the academics want, as you will alienate the students.

Main feedback from Academics – the system isn’t easy enough, and isn’t flexible enough. They want it to be even easier to build their list, and they find that it doesn’t always support the way they want to enter the citation.

Interestingly when Fiona suggested not calling this a ‘reading list’ system, but rather a ‘resource list’ system, she wasn’t able to get this through. I guess that this could be a pragmatic decision – I agree with Fiona, that we have moved from ‘reading lists’ to ‘resource lists’, but generally I think the former term would be better understood.

Now a recorded video interview with an academic in Human/Computer interaction (Alan from University of Lancaster –  didn’t get his surname). He is relating how academics are often asked several times for lists by different resources (e.g. book shop, library), which is annoying, but the main reason seems to be that he doesn’t get time. Interestingly he relates how he puts the list on his own webpage, rather than using the formal systems (VLE etc.), which he says isn’t him being peverse, but that he simply doesn’t have time – essentially it seems that he finds the route of least resistance is to use his own website.

He seems pretty typical in accepting that he might be part of the problem, but he isn’t motivated enought to be part of the solution. He tells a story about how the bookshop didn’t stock his own textbook, that was a key text for the course – but then sheepishly admits that there could have been a request come round in the summer for him to say what reading he was recommending.

However, he suggests that he would be reluctant to do something different to having it on his webpage, but he would be willing to have it harvested from his webpage. I think this quote says it all, when he says he would be really reluctant to enter the list somewhere else "to me the control thing is quite important, I think a lot of academics are control freaks"

He claims that if he could see the benefits then he would do it – which is fair enough – but what he doesn’t seem to realise is that the benefit would be that the students could actually get hold of the texts he is recommending – so hopefully better pass rates, and certainly less students bothering him about how he could hold of his recommended reading.

Chris is now saying that the next generation of Talis List is going to be available well before Sept 2008 so can be used for Academic year 08/09 – sounds interesting. Now getting a (canned) demo of the system as it is at the moment:

  • Quite a cool visual browse interface (slightly worried that this looks a bit gimmicky)
  • Ability to divide a list into ‘sections’
  • Adding items to the list using a ‘bookmarklet‘ – for example, browse Amazon, find book, click bookmarklet, it imports the reference into the list, and enhances with information from the library catalogue
  • If you are on a website without any bibliographic details on it, the bookmarklet assumes you want to add that page to the reading list instead
  • You can also search the library catalogue from within zephyr itself
  • Can reorder list by dragging and dropping individual items, or sections
  • Can expose list in multiple interfaces – the example used here is Facebook – cool

Overall, this looks very interesting – I’m due to have a chat with one of the Talis staff about this at somepoint (might try to find them before the last session).

There is a blog for Talis list at http://blogs.talis.com/list

Guided learning, resource discovery

This session is about a ’21st Century’ approach to academic resource lists (a.k.a. Reading lists). Talis have had a product ‘Talis List’ for several years, but they are now working on ‘Project Zephyr’ for a Next Generation approach to this – I’m hoping that this session is going to cover what they are doing in this area.

It looks like this is another case study with Fiona Greig from the University of Plymouth – who I just talked to over lunch.

Starting with Chris Clarke from Talis – outlining the problems of reading/resource list management – Students who want resources, Academics who are short on time, and librarians who need to get lists, order resources, setup loan statuses, etc.

Talis list has been around for several years – can be used with any LMS, and integrates with ‘the VLE’ (that latter statement isn’t very specific). But – could be improved – better integration with LMS, ability to suggest loan strategies for items on resource lists based on usage, better workflows.

Now the case study – they have the current Talis List product. University of Plymouth have traditionally had problems getting the lists from academics. So, focussed on carefully selected academics, and got 135 active modules with 504 active lists – went live in Septemer 2007. Already hearing that students are now demanding the service from their academics.

Student expectations:

  • Full-text available straightaway
  • They want control – how they use resources
  • They want them remotely and on the move
  • They share resources
  • They use Multimedia – academics creating multimedia objects – move away from ‘essential reading’ to ‘essential resources’
  • They want to know when they should read something (e.g. which week of the course) – alerts to tell them

Emphasis from Fiona that it has to be driven by the students – they are the main users. She suggests that you have to bring the academics along with you, but you can’t simply listen to what the academics want, as you will alienate the students.

Main feedback from Academics – the system isn’t easy enough, and isn’t flexible enough. They want it to be even easier to build their list, and they find that it doesn’t always support the way they want to enter the citation.

Interestingly when Fiona suggested not calling this a ‘reading list’ system, but rather a ‘resource list’ system, she wasn’t able to get this through. I guess that this could be a pragmatic decision – I agree with Fiona, that we have moved from ‘reading lists’ to ‘resource lists’, but generally I think the former term would be better understood.

Now a recorded video interview with an academic in Human/Computer interaction (Alan from University of Lancaster –  didn’t get his surname). He is relating how academics are often asked several times for lists by different resources (e.g. book shop, library), which is annoying, but the main reason seems to be that he doesn’t get time. Interestingly he relates how he puts the list on his own webpage, rather than using the formal systems (VLE etc.), which he says isn’t him being peverse, but that he simply doesn’t have time – essentially it seems that he finds the route of least resistance is to use his own website.

He seems pretty typical in accepting that he might be part of the problem, but he isn’t motivated enought to be part of the solution. He tells a story about how the bookshop didn’t stock his own textbook, that was a key text for the course – but then sheepishly admits that there could have been a request come round in the summer for him to say what reading he was recommending.

However, he suggests that he would be reluctant to do something different to having it on his webpage, but he would be willing to have it harvested from his webpage. I think this quote says it all, when he says he would be really reluctant to enter the list somewhere else "to me the control thing is quite important, I think a lot of academics are control freaks"

He claims that if he could see the benefits then he would do it – which is fair enough – but what he doesn’t seem to realise is that the benefit would be that the students could actually get hold of the texts he is recommending – so hopefully better pass rates, and certainly less students bothering him about how he could hold of his recommended reading.

Chris is now saying that the next generation of Talis List is going to be available well before Sept 2008 so can be used for Academic year 08/09 – sounds interesting. Now getting a (canned) demo of the system as it is at the moment:

  • Quite a cool visual browse interface (slightly worried that this looks a bit gimmicky)
  • Ability to divide a list into ‘sections’
  • Adding items to the list using a ‘bookmarklet‘ – for example, browse Amazon, find book, click bookmarklet, it imports the reference into the list, and enhances with information from the library catalogue
  • If you are on a website without any bibliographic details on it, the bookmarklet assumes you want to add that page to the reading list instead
  • You can also search the library catalogue from within zephyr itself
  • Can reorder list by dragging and dropping individual items, or sections
  • Can expose list in multiple interfaces – the example used here is Facebook – cool

Overall, this looks very interesting – I’m due to have a chat with one of the Talis staff about this at somepoint (might try to find them before the last session).

There is a blog for Talis list at http://blogs.talis.com/list

Achieving total Finance Management

This may not sound like the most thrilling session (especially straight after lunch), but I’m hoping they are going to talk about integration with corporate finance systems. Talis Keystone seems to be the main ‘integration’ product – we are going to get a case study from Liverpool Hope University.

What is Talis Keystone? Uses standards (IT and Library standards) to allow integration.

At Liverpool Hope – small budget, with purchasing from consortium approved suppliers, as well as credit card purchases from Amazon. They use the ‘Agresso’ finance system, recently changed from the ‘Opera’ finance system.

The drivers were to avoid double data entry, getting up-to-date financial records that match on both systems, ability to search Finance system with standard data (e.g. use same order numbers on both systems). They decided to deal only with One-off purchases to start with, and couldn’t deal with purchase card in the first instance.

Need to be able to deal with New orders, good received, cancellations, part receipting, part cancellations etc.

Started by having a very detailed meeting with all the relevant players – Talis, Library, Finance, IT. They flow diagrammed the Library acquisitons process and the Finance process, matching the two together. Clearly identified what was going to be included, and what excluded in the project. Also identified limitations – e.g. Agresso could not accept changed information e.g. price, quantity (this has to be amended manually) – this sounds like quite a serious limitation to me!

They then had a followup meeting looking at data fields in both systems, and how they mapped to each other.

After getting the technical side sorted, they did structured testing, with both ‘standard’ scenarios, and some ‘try to break the system’ unexpected but realistic scenarios. Load testing. Testing was a time consuming part of the project.

Now at the point of implementation – need to sort out who does what (finance or library), especially for problem solving, need to automate some procedures, need to put in appropriate monitoring, and look at working practices.

Although not live yet, it sounds like a great project. I’ve been looking at how we handle financial transactions, and I think we might want to look at running a similar project.

Achieving total Finance Management

This may not sound like the most thrilling session (especially straight after lunch), but I’m hoping they are going to talk about integration with corporate finance systems. Talis Keystone seems to be the main ‘integration’ product – we are going to get a case study from Liverpool Hope University.

What is Talis Keystone? Uses standards (IT and Library standards) to allow integration.

At Liverpool Hope – small budget, with purchasing from consortium approved suppliers, as well as credit card purchases from Amazon. They use the ‘Agresso’ finance system, recently changed from the ‘Opera’ finance system.

The drivers were to avoid double data entry, getting up-to-date financial records that match on both systems, ability to search Finance system with standard data (e.g. use same order numbers on both systems). They decided to deal only with One-off purchases to start with, and couldn’t deal with purchase card in the first instance.

Need to be able to deal with New orders, good received, cancellations, part receipting, part cancellations etc.

Started by having a very detailed meeting with all the relevant players – Talis, Library, Finance, IT. They flow diagrammed the Library acquisitons process and the Finance process, matching the two together. Clearly identified what was going to be included, and what excluded in the project. Also identified limitations – e.g. Agresso could not accept changed information e.g. price, quantity (this has to be amended manually) – this sounds like quite a serious limitation to me!

They then had a followup meeting looking at data fields in both systems, and how they mapped to each other.

After getting the technical side sorted, they did structured testing, with both ‘standard’ scenarios, and some ‘try to break the system’ unexpected but realistic scenarios. Load testing. Testing was a time consuming part of the project.

Now at the point of implementation – need to sort out who does what (finance or library), especially for problem solving, need to automate some procedures, need to put in appropriate monitoring, and look at working practices.

Although not live yet, it sounds like a great project. I’ve been looking at how we handle financial transactions, and I think we might want to look at running a similar project.

Integrating Library Services

Just dashed from one presentation to another in a different room – unfortunately the start and end times don’t quite match, so I’ve come in halfway through a presentation from Queens University Belfast about integrating library services (apologies to Nicole Harris who was presenting on Federated Access Management, but I’ve seen quite a lot on this before…)

OK – so Queen’s Online seems to be the Queen’s portal. They have a ‘My Library’ channel which shows overdues, holds, etc. with single sign on. This is similar to a project we are involved in at Imperial at the moment.

Queen’s have introduced a strict fines policy, so they looked to make it as easy as possible for students to manage their account – to keep their loans current etc. Also looked at easiest way to pay. They established a project group with people from the library and the portal project. Implemented an e-pay solution, so fins can be paid online (use WorldPay to process credit card transactions).

They are describing how they setup the online payment service, using Talis Keystone, Queen’s Online and Worldpay, using Web Services – I won’t go into the detail, but the important thing here is the use of Web Services making it easy to integrate the library account with other systems.

The success of the work with Queen’s Online team is opening doors (really pays to work well with the University IT dept – they can help make things happen!). They are looking at increasing their presence in the portal, as well as smartcard epayments.

Andy Latham from Talis is summing up, basically saying that Talis Keystone is the solution for integration – I guess it is a Talis conference 🙂 Just finishing with the IT/Library video that is a take on the Mac/PC adverts – but I can’t track down the URL – if anyone knows it, drop it into the comments.

Integrating Library Services

Just dashed from one presentation to another in a different room – unfortunately the start and end times don’t quite match, so I’ve come in halfway through a presentation from Queens University Belfast about integrating library services (apologies to Nicole Harris who was presenting on Federated Access Management, but I’ve seen quite a lot on this before…)

OK – so Queen’s Online seems to be the Queen’s portal. They have a ‘My Library’ channel which shows overdues, holds, etc. with single sign on. This is similar to a project we are involved in at Imperial at the moment.

Queen’s have introduced a strict fines policy, so they looked to make it as easy as possible for students to manage their account – to keep their loans current etc. Also looked at easiest way to pay. They established a project group with people from the library and the portal project. Implemented an e-pay solution, so fins can be paid online (use WorldPay to process credit card transactions).

They are describing how they setup the online payment service, using Talis Keystone, Queen’s Online and Worldpay, using Web Services – I won’t go into the detail, but the important thing here is the use of Web Services making it easy to integrate the library account with other systems.

The success of the work with Queen’s Online team is opening doors (really pays to work well with the University IT dept – they can help make things happen!). They are looking at increasing their presence in the portal, as well as smartcard epayments.

Andy Latham from Talis is summing up, basically saying that Talis Keystone is the solution for integration – I guess it is a Talis conference 🙂 Just finishing with the IT/Library video that is a take on the Mac/PC adverts – but I can’t track down the URL – if anyone knows it, drop it into the comments.

eScience, Scholarly Communication and the Transformation of Research Libraries

This talk by Tony Hey – Corporate VP for External Research, Microsoft Research.

So, Tony is saying that we are seeing an ’emergence of a new Data-Centric paradigm for research’, and that Web 2.0 students won’t use the library in the traditional way – so there is a need to redefine the role of the research library.

We have seen (and continue to see) and explosion in the amount of data being produced in scientific research – huge amounts of data being produced by instruments, simulations, sensor networks – we are able to ‘measure’ stuff to an overwhelming degree. Tony sees management and ‘curation’ of this data as a huge challenge for the research community – he says the scale of the challenge is one of the reasons he joined MS.

The ‘Scientific Data Deluge’ – data collection, data processing, digital preservation.

An example – ‘Fighting HIV with Computer Science’:
Research from ‘Spam Blocking’ machine learning project, which then moved to use of machine learning in tools that scientists can use. The original project was aimed to analyse huge amounts of data as to whether it was spam or not – led to drawing out correlations in huge data sets on HIV.

Cyberinfrastructure – this is the real problem, the ‘calculation’ bit is easy, it is the infrastructure needed (both technical and organisational) that is the problem. Tony references the NSF report on this (http://www.nsf.gov/pubs/2007/nsf0728/index.jsp).

Tony makes the point that it isn’t just about e-Science, but e-Research – the same issue applies to arts and humanities.

Tony says research today is:

  • Data intensive
  • Compute intensive
  • Collaborative
  • Multi-disciplinary

Today – web users are using tools that could really help here, but typically Researchers are using custom standalone tools, the ‘sharing’ process is still via long publication process, physical meetings etc.

In eResearch data is easily accessible, shareable, (eg. http://cas.sdss.org/dr5/en), services expose functionality (e.g. BLAST from the NLM, http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome), services are in the cloud rather than installed locally (e.g. Amazon Web Services – S3, EC2 – this also used for home storage  solutions – JungleDisk).

Researchers can be seen as ‘extreme information workers’ – looking for subtle signs in the information available.

Publications as live documents – starting to see examples of figures in electronic publications that are based on ‘live’ data – so the reader can change aspects of a graph, plot different scales, overlay other data etc.

Just discovered that quite a few of the slides that Tony is using are available at http://research.microsoft.com/workshops/CEfS2007/presentations/TonyHey.pdf (although this is from a different talk, many of the slides seem to be the same).

Microsoft are building a Virtual Research Environment (VRE) with the British Library – looks like a web portal with stuff like RSS feeds, funding opportunity alerts, saved searchers, integration with MS tools (e.g. OneNote) for bibliography, Word and Excel 2007 – could add external tools to the ‘ribbon’ – e.g. library research tools)

Tony is going through his slides quite quickly so hard to capture. Now onto Scholarly publishing – the rules are changing – comparing to the Music Industry and music downloads – scholarly publishing industry (publishers and libraries/universities/academics) need to adjust.

Funding bodies now starting to make deposit of research results (publications, data and primary materials) mandatory as part of funding agreement (e.g. ERC)

Referencing article by Paul Ginsparg ‘As we may read‘ published in the Journal of Neuroscience, Sept 20, 2006. Ginsparg was the driving force behind ArXiV – he sees this model being adopted across all research areas. Also, sees a role for libraries and societies – perhaps reclaiming roles they fulfilled in the 19th century. Tony suggests that libraries are not necessarily fulfilling this function – I would argue that universities are not clear they want this…

If you look at ranking of universities on Google Scholar – University of Southampton is the top ranking UK University in this measure – which isn’t a ‘quality’ judge, but think about how available this information is – this means that papers from UoS get more visibility, more citations, more influence.

All the tools to support this need to be completely straightforward for the researcher – no extra effort.

The EU PLANETS Project – Digital Preservation – use of XML – specifically the Office OpenXML – now an ECMA Standard – but also open source ODF to OOXML converter – ODF is the ‘Open Document Format

Tony Hey leaves us with a challenge – once eResearch is ‘in the Cloud’  where is the Research Library?

Question: Will commercial publishers be destroyed by OA?
Answer: No – MS working with publishers. Tony thinks the ‘big’ ones will be fine – Science, Nature etc. But smaller publications may be more challenged – however Tony is keen to work with smaller publications to see how this can work – he doesn’t want them to go out of business but he believes the business model has to change.

Question: Where does payment come in?
Answer: Tony seems not particularly in favour of Author pays – sees problems with the model

Question: Who curates data in ‘mashups’
Answer: It’s a problem – if data coming from different sources, are they all conforming to the same curation standards – seems unlikely – perhaps this is where more commercial opportunity here.

Question (from me): Do researchers want to share their data – data is valuable?
Answer: Tony’s personal opinion is that they should have to share their data, but perhaps after a certain amount of time – keen to stress this is his personal view.

eScience, Scholarly Communication and the Transformation of Research Libraries

This talk by Tony Hey – Corporate VP for External Research, Microsoft Research.

So, Tony is saying that we are seeing an ’emergence of a new Data-Centric paradigm for research’, and that Web 2.0 students won’t use the library in the traditional way – so there is a need to redefine the role of the research library.

We have seen (and continue to see) and explosion in the amount of data being produced in scientific research – huge amounts of data being produced by instruments, simulations, sensor networks – we are able to ‘measure’ stuff to an overwhelming degree. Tony sees management and ‘curation’ of this data as a huge challenge for the research community – he says the scale of the challenge is one of the reasons he joined MS.

The ‘Scientific Data Deluge’ – data collection, data processing, digital preservation.

An example – ‘Fighting HIV with Computer Science’:
Research from ‘Spam Blocking’ machine learning project, which then moved to use of machine learning in tools that scientists can use. The original project was aimed to analyse huge amounts of data as to whether it was spam or not – led to drawing out correlations in huge data sets on HIV.

Cyberinfrastructure – this is the real problem, the ‘calculation’ bit is easy, it is the infrastructure needed (both technical and organisational) that is the problem. Tony references the NSF report on this (http://www.nsf.gov/pubs/2007/nsf0728/index.jsp).

Tony makes the point that it isn’t just about e-Science, but e-Research – the same issue applies to arts and humanities.

Tony says research today is:

  • Data intensive
  • Compute intensive
  • Collaborative
  • Multi-disciplinary

Today – web users are using tools that could really help here, but typically Researchers are using custom standalone tools, the ‘sharing’ process is still via long publication process, physical meetings etc.

In eResearch data is easily accessible, shareable, (eg. http://cas.sdss.org/dr5/en), services expose functionality (e.g. BLAST from the NLM, http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome), services are in the cloud rather than installed locally (e.g. Amazon Web Services – S3, EC2 – this also used for home storage  solutions – JungleDisk).

Researchers can be seen as ‘extreme information workers’ – looking for subtle signs in the information available.

Publications as live documents – starting to see examples of figures in electronic publications that are based on ‘live’ data – so the reader can change aspects of a graph, plot different scales, overlay other data etc.

Just discovered that quite a few of the slides that Tony is using are available at http://research.microsoft.com/workshops/CEfS2007/presentations/TonyHey.pdf (although this is from a different talk, many of the slides seem to be the same).

Microsoft are building a Virtual Research Environment (VRE) with the British Library – looks like a web portal with stuff like RSS feeds, funding opportunity alerts, saved searchers, integration with MS tools (e.g. OneNote) for bibliography, Word and Excel 2007 – could add external tools to the ‘ribbon’ – e.g. library research tools)

Tony is going through his slides quite quickly so hard to capture. Now onto Scholarly publishing – the rules are changing – comparing to the Music Industry and music downloads – scholarly publishing industry (publishers and libraries/universities/academics) need to adjust.

Funding bodies now starting to make deposit of research results (publications, data and primary materials) mandatory as part of funding agreement (e.g. ERC)

Referencing article by Paul Ginsparg ‘As we may read‘ published in the Journal of Neuroscience, Sept 20, 2006. Ginsparg was the driving force behind ArXiV – he sees this model being adopted across all research areas. Also, sees a role for libraries and societies – perhaps reclaiming roles they fulfilled in the 19th century. Tony suggests that libraries are not necessarily fulfilling this function – I would argue that universities are not clear they want this…

If you look at ranking of universities on Google Scholar – University of Southampton is the top ranking UK University in this measure – which isn’t a ‘quality’ judge, but think about how available this information is – this means that papers from UoS get more visibility, more citations, more influence.

All the tools to support this need to be completely straightforward for the researcher – no extra effort.

The EU PLANETS Project – Digital Preservation – use of XML – specifically the Office OpenXML – now an ECMA Standard – but also open source ODF to OOXML converter – ODF is the ‘Open Document Format

Tony Hey leaves us with a challenge – once eResearch is ‘in the Cloud’  where is the Research Library?

Question: Will commercial publishers be destroyed by OA?
Answer: No – MS working with publishers. Tony thinks the ‘big’ ones will be fine – Science, Nature etc. But smaller publications may be more challenged – however Tony is keen to work with smaller publications to see how this can work – he doesn’t want them to go out of business but he believes the business model has to change.

Question: Where does payment come in?
Answer: Tony seems not particularly in favour of Author pays – sees problems with the model

Question: Who curates data in ‘mashups’
Answer: It’s a problem – if data coming from different sources, are they all conforming to the same curation standards – seems unlikely – perhaps this is where more commercial opportunity here.

Question (from me): Do researchers want to share their data – data is valuable?
Answer: Tony’s personal opinion is that they should have to share their data, but perhaps after a certain amount of time – keen to stress this is his personal view.