ALA 2008: Institutional Repositories: New Roles for Acquisitions – Acquiring Content Adding ETDs to your Digital Repository

This session by Terry Owen ‘DRUM Coordinator’ from the University of Maryland libraries.

Going to show workflows they developed for adding electronic theses to their repository (called DRUM).

Another DSpace implementation – launched in 2004 (with 1100 docs – all theses), 7900+ documents as of June 2008.

They have 20 DSpace ‘Communities’ (I need to look at difference between ‘community’ and ‘collection’ on DSpace)

Sorry – drifted off there…

Generally the Grad Schools who initiate the ETD – the stakeholders for ETD are:

  • Students
  • Faculty Advisors
  • Graduate School
  • Library
  • IT Dept

The s/w options for ETD submission (i.e. the bit the student interacts with):

  • Proquest/BEPRESS
  • ETD-db (Virginia Tech and NDLTD – Networked Digital Library of Theses and Dissertations – recommended for advice)

Running through some benefits of ETD:

  • Can be found, read, used by global audience
  • Increases chances of citation
  • Lower costs (printing and copying)
  • Less hassle for students
  • Educates students on electronic publishing
  • Showcases an institution’s research

Some workflow stuff – need the slides really though. Noting that when students enter data they make lots of mistakes – to titles, even to their own names.

However, only the library catalogue record is checked – then the cataloguers pass the information to DRUM, who make corrections ‘as time allows’ – this is absolute madness!

They provide links from the library catalogue to the DRUM record – either via URL recorded in MARC record, or via OpenURL link resolver (which leads to the question in my mind – why bother having any metadata in DRUM at all – just have it in the library catalogue!)

Some ETD concerns:

  • Will journal publishers still accept my article if it is available electronically
  • What if I want to submit a patent based on my research?
  • What if I want to write a book related to my thesis
  • etc.

So, decided to provide Embargo options:

  • Restrict access for 1 year
  • Restrict for 6 years
  • Restrict indefinitely
    • Requires written approval by the Dean of the Graduate School

However, the print copy is not embargoed – and will be supplied on Inter-library Loan! So just making work for ourselves here!

Why embargo?

  • 1 year – for patent protection on materials, to publish in a journal with restrictive publication policies
  • 6 years – to write a book

DSpace embargo options very limited. Could have created an ‘Open’ and ‘Closed’ collection – but since this doubles the number of collections.

Can control access to items (I think this is exactly what we need for our MSc theses – need to investigate, since I was told it couldn’t be done) – however, it doesn’t work very well from a user experience perspective – asks you to login, then tells you that you can’t access it.

Instead they decided to create a ‘Restricted Access’ option, which explains to the end user. They have automated the process – the grad school pass the embargo information across with the metadata (I think this is right) and automatically applied.

There is a form that the students use – all students fill it out, offers options of ‘immediate access’, ‘1 year embargo’, ‘6 year embargo’, or ‘indefinite embargo’ – has to be signed by faculty advisor, and comes with handout about the embargo, and why you would embargo etc.

So far 474 requests for embargos (since 2006) – represents 31% of submission (note that the first 1 year embargos have now passed, so less that 474 embargoed theses in the system).

Most commonly embargoed (by percentage) are Chemistry and Life Sciences, and Business. More 6 year embargoes from Arts and Humanities – because of book writing.

They see the rate of embargo as high – are planning to do more education about this.

The Grad School committee did not want electronic copies ‘floating around’ – so library is doing all kinds of jumping through hoops to print out and mail theses that are requested on ILL. Looking at possibility of having a non-printable PDF. Also hoping to allow on-campus access to embargoed ETDs.

I think I would have lost patience at this point – lucky I don’t do advocacy 😉

They have some special cases – Copyrighted works -have a ‘redacted’ version in DRUM, and a note is added – a full version is kept in the library (either in print or on CD/DVD etc.) – again what nonsense.

Sorry – it isn’t the DRUM managers fault, I just can’t quite believe the contortions here (although note that the number of theses falling into this last category is small).

In summary:

  • ETDs require regular attention
  • Build a good relationship with the Grad School
  • Important to educate faculty advisors and students about open access issues
  • Be prepared to implement embargoes
  • Link ETDs to library catalog
  • Have plans in place for special cases (copyrighted works)
  • Efficient and capable IT department
Technorati Tags:

ALA 2008: Institutional Repositories: New Roles for Acquisitions – Ohio State University’s repository

Called the ‘Knowledge Bank’ – is DSpace. Content is defined by ‘institution’ – but different types of content:

  • Journals
  • Monographs – e.g. Ohio State University Press
  • Undergraduate Theses (not ETDs which are done in a consortial system)
  • Conference materials/Technical reports/Images/etc.
  • Currently 30k records (started since 2004, but with some significant batch deposits)

Knowledge Bank pushes out to other sources – e.g. ScientificCommons.org, OAISter

They created a Metadata Application Profile for the KnowledgeBank, using core set of metadata elements and DC Metadate Element Set – available from KnowledgeBank website

Question – does the metadata make sense when it is outside the institutional context? Example of a library newsletter – makes sense in the KnowledgeBank because it exists in a hierarchy (not sure, but I guess by collections?), so they didn’t in the firstplace bother replicating this information in the record. However, then when taken out of that hierarchy and put into OAISter (for example)  – without that hierarchy information, it was impossible to tell what it was.

They decided to add the relevant information back into the record (this seems really wasteful – it should have been done surely at a system level – should have been possible to automate the integration of the hierarchy information into the record without having to rekey)

Mentioning problems of authority control – lots of people contributing etc – so many variations in author names, and in subjects etc. They are doing a project to clean this up for a single collection at the moment

Saying that people often add keywords that are already in the title, so they don’t add information (although I’d argue that it does add information – this kind of thing could be used to help relevancy surely?)

They have setup a ‘Community Metadata Application Profile’ – which is shared with all who submit material into the repository. She is showing some details on the slides, but I can’t read it.

They have Customized Item Metadata display at a collection level. Also customize collection display – e.g. for journal collections, they have a ‘Table of Contents’ display which can be browsed from an ‘issue’ record.

They have License Agreement in place, with an optional Creative Commons license – done each time someone submits. When submission is done by a proxy, the individual signs a permission to allow this – which is then attached to the item, though suppressed from public view.

There are customized input forms for submission – again at Collection level. Can also do customized input templates with prepopulated metadata fields for repeated information.

There are Item Submission Workflows – example of the speakers workflow areas – can approve/reject/or push back into the pool of work.

Talking about batch loading of items – using (for e.g.) a spreadsheet to create the data (cols of DC data) – creates an XML file, which then loaded in batch. Using a spreadsheet means no new interface to learn for people not working with the KnowledgeBank everyday. (I’d personally prefer to see a repository that was easy to use, so this wasn’t a problem)

They also repurpose MARC metadata for things that may have already been catalogued in a Library catalogue systems – transforming it into DC and loading into the KnowledgeBank.

Technorati Tags:

ALA 2008: Institutional Repositories: New Roles for Acquisitions

The last session that I’m going to – but really relevant. Unfortunately I’ve missed the first 10 minutes or so. Someone (think it must be Peter Gorman from University of Wisconsin-Madison?) is speaking about their experience of having an institutional repository.

Just mentioned the SWORD API to help deposit workflow. Also mentioning bibapp, and using the SWORD API to push stuff from bibapp to the institutional repository. Also EM-Loader doing something similar.

So, what is the difference between Institutional Repository content and Digital Library content? Users doing (necessarily) care where stuff comes from, or how it gets there, and most the objects, although very varied, have the same fundamental management, preservation and access needs.

This has challenged the assumption underlying their IR infrastructure.

Now showing a ‘scary diagram’ – showing how one central ‘repository’ could take in content, and what services it would need to support.

Some interesting questions remain:

  • What is a collection?
    • Does the material determine it?
    • Does our s/w determine it?
    • Does our workflow determine it?
    • What aggregations are meaningful to our users – and in what contexts?
    • Single repository gives possibility of more flexible aggregations that serve specific contexts (I’d say I’m not sure this depends on the backend storage, but on the access systems, but I think the overall point is a good one)
  • When do we select?
  • What do we catalog? – and why?
  • What’s the role of Archives? Overlap with traditional archives roles – in physical world, well established, need to establish them for the virtual world

No answers to these at the moment…

Moving to a different topic, Copyright:

We (librarians) may have mutliple roles:

  • Deciding what to digitize
  • Determining access rights
  • Negotiating digitization/access rights
  • Advising contributors on copyright and Fair Use
    • Faculty submitters
    • Students (Electronic Theses)
  • And sharing knowledge with others
    • Orphan works

Mentioning OCLC idea of joint work on this, creating a central database on this. Google have released copyright information they have collected this week on works. Hoping that the Google and OCLC efforts can be brought together.

Copyright determination: theses and dissertations

  • Is it published? (according to Copyright law) – speaker thinks ‘yes’ but they are looking into it at the moment and getting legal advice
    • What is the publication date?
    • Is there a copyright notice?
    • Does Fair Use apply?

Mentioning a resource from the Library of Congress ‘circular 22’ – how to investigate the copyright status of a work – noting the first half is scary and seems designed to put you off even starting the process – but skip that and go to the second half which is full of really good advice.

Also, there are flowcharts from places – e.g. from lawyers Bromberg and Sunstein which was used by speakers institution.

Technorati Tags:

ALA 2008: Top Technology Trends

I’ve decided to take a break from cataloging this afternoon and opted for the easier on my brain ‘LITA Top Tech Trends’ session. Ironically this is the first session where I haven’t been able to get online 🙁 hooray – managed to get online.

The panelists are:

  • Karen Coyle
  • Eric Lease Morgan
  • John Blyberg
  • Meredith Farkas
  • Roy Tennant
  • Clifford Lynch
  • Karen Schneider
  • Marshall Breeding

Two online participants as well – one is Karen Coombs, but I didn’t get the other persons name unfortunately the other was Sarah Houghton-Jan

All of these are Interesting how some of the same names keep cropping up – would be nice to spread the speaking goodness around a bit folks!

There is a chat room at http://www.meebo.com/room/toptech/ – might be testing my multi-tasking to the limit!

MB: Open source – already in public, will come into academic. Make sure that systems are actually open, not just called open etc. Also look for open data

KS: Broadband – quite political stuff – no proper telecommunication strategy at federal level I think is the basic message.

Open Access – small literary journals are doing this, because they can in the online world – gets rid of costs. No so visible to librarians, as we tend not to ‘own’ these things

Lots of tech problems with online participants – sound patchy, video not brilliant etc. Good that it’s being tried though!

CL: After enthusiasm for Open Source we will see a backlash as people try to come to a realistic view – is he talking about the ‘hype cycle’?

Collaboration across the network – people need to be able to work together in an agile fashion – sychronous and asynchronous. Report – ‘Beyond Being There’ – looks at the issues around virtual organisations.

Travel will get much more expensive and less common. Virtual presence in meetings needs to get much better much more quickly.

Need to look at how we regard privacy of information

Letting go of ‘holdings’ so they can be reused and put into contexts where they add value outside the normal venues.

Overload?

RT: Suprises are the norm! Google digitisation was a suprise. Hope that OCLC can suprise

We need to retool for constant change.

Need to get over the system upgrade cycle – need to be on the latest platform, and upgrade in timely manner

MF: Role of social s/w in collecting local knowledge.

Library as social hub, providing technology for community – e.g. slide scanner to digitise peoples slides

Combine h/w, s/w and education that people need to do digital projects

Blogs as historical artifacts. If someone is taking a class in 50 years time about library history – will they be able to see the blogs that started and developed the library 2.0 movement?

JB: Green technology – cost and environment concerns

Adding semantic markup to docs – extracting meaning from text

Mobile, always on devices

Personal relationship with information – connections etc.

KC: APIs!

ELM: Got to get your ‘stuff’ out there – a Web API is the way to do this.

KC: Handheld devices

Give up control on data

Didn’t get so much of this – too busy taking part in the discussion in the chat room – oh well…

Technorati Tags:

ALA 2008: Future of cataloging, etc. discussion session

RT: Pointing out LibraryThing doesn’t give away all its data anymore that OCLC do

TS: Only tags are protected

Bit of a ‘librarything’ vs ‘OCLC’ thing going on here – find this a bit petty and dull

MY: Television went free because of advertising – but this changed the nature of what was on TV – just stuff that attracted the advertisers target audience

JB: First thought on seeing librarything user generated links between books – oh my god – what if it is done wrong? But then, realised – you have to view the data in the way it has been added – don’t mistake it for cataloging, but it is really valuable.

TS: LibraryThing still only has binary representation of FRBR relationships – needs to be more sophisticated

DH: Need to capture ‘point of view’ – make the point we have a ‘point of view’ – maybe not as objective as we think – but it is important. We need to allow different points of view – then we can form communities of practice around those people who share our point of view

This is so important – I really think this is key. What we need is ways of being able to express a ‘point of view’ and filter the world to allow us to use the things done by specific people or communities to give us ‘our’ view. I wonder if there is an approach or we need one which would allows a ‘distributed’ wiki approach – where you could overlay changes only made by specific people etc? This is how collaborative cataloging would work in my head – I need to write something on this and explain it more clearly.

RW: Not all going to happen one way – how do we deal with this?

TS: Open data

TS: RDF – just another example of an over-engineered solution – worried many web people don’t believe in it

DH: Important to know about RDF – agree not necessarily ‘the answer’ – no right and wrong, but mix of approaches

I think that we need to at least embrace the concepts of RDF – this is about linked data folks – I don’t care to some extent about mechanics – RDF, Microformats, structured HTML etc.

DH: We shouldn’t spend all our time on secondary products (books and serials) – need to look at primary stuff – the ‘long tail’ of library resources

TS: LibraryThing has better series information than you buy from Bowker. Publishers want their covers out there. There is going to be a lot of information

MY: But if there is value in cataloguing, shouldn’t it be paid for?

DH: Don’t count value by each record created. You can pay for things in different ways. Need to stop thinking about charging for metadata even though it costs to create it. You have to make value further down the chain

This is the same as the idea of the ‘free our data’ campaign by the Guardian – we increase value by giving away information, because it aids the information economy, which grows, and pushes value back into the system. This is counter-intuitive, but the report from Cambridge on this showed the vast amount of value in publicly funded information like Ordinance Survey.

MY: It is difficult to carryout cost/benefit studies in libraries – they usually end up just being ‘cost’ because benefit so difficult to measure. Problem is that we serve an ‘elite’ and difficult for society to see that value

I disagree with some of this – it is used by ‘an elite’ because this is who we make it available to – comes back to open data again’ . I agree it is important to fund universities – and would agree ‘benefit’ is difficult to measure

Now open to the floor for comments and questions:

Q: Question around issues of ‘turning loose data’ – concerns – perhaps several overlapping concerns

A: It is scary, but need to do

TS: Need a fielded ‘forkable’ wiki as catalog – not wikipedia model where there is ‘one right answer’

Comment: What are the five most important things we ought to be teaching LS students in knowledge organisation/cataloguing right now? Answers online please!

Q: Libraries are not something ‘isolated’ – how do we fit into an integrated world?

RT: Very much agree need to break down barriers between different information silos – archives, libraries, museums

Q: The unique/unusual stuff isn’t going to be tagged on librarything

DH: We need to understand that it is not just one cataloguers responsibility to provide metadata on a resource. There is a community around every object – and you have to harness that

TS: There is a lower limit – communities need to be a particular size to be useful

Q: How much of the success of librarything is on planning and how much ‘on the fly’. Also what about the economics – how do you get paid?

TS: Just throw stuff up and see how it flies

TS: On economics – do good ideas, then work out how you get paid – if it is good enough, money will flow towards it

Missed a load of discussion there, because I got up to ask a question, however worth noting that a lady from ? the National Library of Singapore ? talked about how by creating ‘microsites’ of some of their documents they increased hits from 400 a month (when the docs were in a ‘digital library system’) to 150,000 a month (and rising at 10% a month). This just hammers home the point that we need to put our data ‘on the web’ in a web native way – microsites may not be the only way – but (for example) if our online systems supported simple URLs to a record (like say Flickr does) then we would have this working – but because they all use (or have traditionally) session IDs in their URLs this just does not happen.

Q: Why does tagging in LibraryThing work but not in other environments?

TS: Whether user contribution is useful or not is highly situational. Don’t believe that tagging will be successful in a library catalog – the user just isn’t in that ‘frame of mind’ – when they are using the catalog, they may not even have read it. If we want to use tagging data in catalogs, libraries will need to bring it in from other sources.

 

Technorati Tags:

ALA 2008: A Has-been cataloger looks at what cataloging will be – Diane Hillmann

Diane Hillmann is Director of Metadata Initiatives and the Information Institute of Syracuse (formerly of Cornell)

There are several converging trends:

  • More catalogers work at a support staff level than as professional librarians
  • More cataloging records are selected by machines
  • More catalog records are being captured from publisher data or other sources
  • More updating of catalog records is done via batch processes
  • Libraries continue to de-emphasize processing of secondary research products (books and serials) in favour of unique, primary materials

Options:

  • Extinction
  • Retool

Extinction:

  • Keep cranking about how nobody appreciates us
  • Asert over and over that we’re already doing everything right – why should we change?
  • Adopt a ‘chicke little’ approach to envisioning the future “the sky is falling”

Retool

  • Consider what cataloger do, and what they will do, and map training
  • Look for support for retraining at many levels
  • Find a new job title – catalogers do a lot of other things

What do ‘metadata librarians’ do (as opposed catalogers – the retooled cataloger):

  • Think about descriptive data without pre-conceptions around descriptive level, granularity or descriptive vocabs
  • Consider the entirety of the discovery and access issues around a set or collection of materials
  • Consider users and uses beyond an individual service when making data design decisions

The metadata librarian is

  • aware of changing user needs
  • understands the evolving information environment
  • works collaboratively with technical staff
  • familiar with all metadata formats and encoding standards

The metadata librarian skill set is:

  • Views data as collections, sets or streams
    • Familiar with a variety of metadata formats (DC, VRA Core, MODS etc.)
    • Understands basics of data encoding (XML, RDF etc.) but is generally not a provrammer
    • Understands the various ways that data can be created (by humans or machines) and manipulated (crosswalked etc.)

Characterisitics of the New World:

  • No more Integrated Library Systems
  • Bibliographic utilities are unlikely to be the ‘central node’ for all data
  • Creation of metadata will become far more decentralized – not all library data
  • Nobody knows how this will all shake out
  • But: Metadata Librarians will be critical in forging solutions

Disintegrated Library Systems:

  • All metadata will not be managed in and delivered from one central store
    • Discovery is the first function that is being disaggregated from the ILS – there will be others
    • Metadata may be managed in a variety of databases, structures and systems

Role of bibliographic utilities:

  • Optimized to be the middleman of the traditional data sharing system
  • Currently limited to handling MARC data – not sure whether or when that will change (RDA will be firths challenge here)
  • New services are contemplated

(as an aside OCLC getting a hard time here today – feel a bit sorry for Roy!)

New models of creation and distribution

  • All data will not be created by librarians
    • some will originate from machine processes
  • We need to exchange data based on a more open model – on the web
  • Broader use of OAI-PMH is a good start towards opening data beyond applications and bespoke portals
  • Need to avoid commoditizing DATA instead base business model on building necessary SERVICES

Not sure about OAI-PMH – why not just published the stuff on a webpage with semantic markup to give structure?

Open data:

  • Nobody knows how rich our data is unless we make it fully available – we can’t compete as data providers unless we do this

 

Technorati Tags:

ALA 2008: Catalogs and the Network level – Roy Tennant

Roy using a quote/concept that I’m going to use in my presentation (grrr):

  • Then: Users built workflow around libraries
  • Now: Library must build services around user workflow

Discovery happens elsewhere…

Roy mentioning some prominent web services:

  • Google
  • Amazon
  • Digg
  • etc.

Noting that:

  • Scale matters
  • Spread matters

a.k.a Concentation and Diffusion

Roy looking back to the time when cataloguers created bib metadata on cards, which could be distributed around libraries.

Roy telling an anecdote how he decided to put ‘Rolling Stones’ under ‘Rock Music’ rather than ‘Music, Pop’ – but that when he did this, only his local library benefited.

We now have the ability to share records – but still we create local records, so changes we make still only deliver local benefit.

If we pushed data back into a global system (Worldcat of course in this context, but the point stands), then we can share that benefit.

The benefits of ‘concentation’

  • Search results ranking
    • Holding data – the number of libraries holding a book says something about ‘importance’ (I think this is true, but Roy’s example of ‘the more libraries, the more important’ I’m not convinced about – there is an issue with ‘ant trails’ here – that is if we all ‘follow the leader’ or the strongest path, there is a risk we don’t explore other potentially useful/better avenue)
    • Usage data
  • Recommendations of related books (a la LibraryThing)
  • User contributed content

WorldCat Identities is an example of the data mining possibilities of a large aggregation

Steve Museum a good example of user contributed data enhancing a collection

‘Switching service’ – allows you to move from one piece of information to another – Roy uses an example of moving from a blog post (via http link) to a book in Worldcat, to a local library record. Noting the ‘last link’ is missing – no option on his local library homepage to ‘send this to me’. If libraries did this – they would ‘so rock’ 😉

Benefits of ‘diffusion’

  • Library holding syndicated into places where people are found (e.g. Google)
  • Small libraries can play in big spaces
  • The more paths to your content the better

Examples of integration of links to library resources – in web pages, in wikipedia etc.

  • Concentation
    • Webscale presence
    • Mobilize data
  • Diffusion
    • Disclosure of links, data and services
Technorati Tags:

ALA 2008: The Future of Cataloging (as seen from LibraryThing) – Tim Spalding

What is LibraryThing?

  • 450,000 registered users
  • 28 million books
  • 37 million tags
  • 50+ imitators
  • LibraryThing is your friend 🙂

LibraryThing use often follows the pattern:

  • Personal cataloging
  • Social networking
  • Social Cataloging

Social cataloging happens in both implicit and explicit ways.

Using examples of ‘Thomas Jefferson‘ and other famous users – where their library collections have been added to LibraryThing

LibraryThing has ‘common knowledge’ fields – things such as characters etc.

A page such as http://www.librarything.com/series/Star+Wars contains more knowledge about the Star Wars series of books than anywhere else in the world.

Showing the power of librarything – tags, bringing together editions etc.

The ‘tag war’ is over. Tim does not believe tags are ‘better’ than subjects – but tags are just great for finding stuff. If you care about finding stuff not asserting ontological reality – then tags are great – you just have to spend some time using them to see this.

The physical basis of classification:

  • A book has 3-6 subject (‘cos that what fits on a card)
  • Subjects are equally true (can’t express degrees of relation to a subject – either a book is about it or not – black and white)
  • Subjects never change (once subjects are allocated you don’t go back – even if terminology changes on in the real world)
  • Only librarians get to add subjects
    • There is only one answer – someone ‘wins’
    • You don’t get a say in how books are classified – you don’t want users writing on the cards – but not relevant in virtual environment
  • Only books are cataloged
  • Cataloging has to be done in the library
  • Most librarian can’t help you, each other, themselves
    • Libraries are NOT good at sharing metadata (contradicting Jennifer) – we tend to pull down records from a central source – very few libraries push back
  • Record creating and editing can’t be distribute
  • Record sharing can’t be shared freely

Two futures:

  • The world ends
    • You (catalogers) are paid less
    • Programmers still get paid
  • You move up the stack
    • An IT-industry analogy – with open source software
    • Demand increasing
    • Low leve work and data becomes commoditized, distributed, free
    • You move higher, get paid more

Tim wants a new shelf order:

  • Replaces Dewey
    • Free (Open Source)
    • Modern
    • Humble – not trying to model the whole world
  • Decided socially, level by level
  • Tested against the world
  • Assignment is distributed
  • I write the code
  • You (cataloguers) be Jimmy Wales (audience asked – who is Jimmy Wales – one of the founders of wikipedia) – look over it, but has no power!

Technorati Tags:

ALA 2008: What I have found out from an attempt to build an RDF model of FRBR-ized cataloging rules – Martha Yee

http://myee.bol.ucla.edu/

Can we preserve all the good stuff we have created with cataloging? We spend too much time doing ‘admin’ work to keep local catalogs under control. See potential in the vision of the ‘semantic web’

Martha summarising the concepts of the semantic web, RDF, RDFS, OWL, SKOS, URIs

As an experiment, Martha decided to create a set of cataloguing rules that are more FRBRized than RDA – details available at her website. Noting, she really doesn’t expect people to adopt these rules – it is an experiment

Questions:

  1. Is it possible for catalogers to tell in all cases whether a piece of data pertains to the expression or the manifestation?
  2. Is it possible to fit our data into RDF/RDFS/OWL/SKOS
  3. If it is, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)?

The overall question is:

  • Can we do what we need to do within the context of the semantic web?

Some problems?

  • Granularity issues – should we be more granular in some areas? Less granular in others?

Martha says that people who dislike MARC argue that it is too granular and requires too much of a learning curve. (I don’t agree – it is this simple, I believe we need to focus on what is important – in some areas this means more, and in others less, granularity – although I also don’t think this is the major problem with ‘MARC’ – the main problem is that others – outside libraries) don’t, and will never, adopt it)

  • Is the internet fast enough to assemble a record from a linked set of URIs?

(I don’t agree with this either – Google’s model of crawling the web doesn’t require the web to be ‘fast’ – we can index/build in advance, not on the fly)

  • Internet seems to be built on ‘free’ intellectual labour – only the programmers get paid

Martha feels this is a real problem – it costs money (it is expensive) to create cataloguing – takes intellectual labour

I think Martha’s experiment is fascinating – I think that many of her ‘problems’ are not actuall problems – but I think they deserve to be answered.

Some comments from the panel:

DH: Really appreciate the work that is being done by Martha – it is hard to get your head round this stuff. But some of the arguments are strawmen. Problems with the way that RDA looks at some of the problems – for e.g. false dichotomy between transcribed values and other values – no reason why both shouldn’t be accommodated.

JB: Need to make distinction between granularity and complexity. Records can be granular and interoperable – and people can decide how they can use that. Don’t need complex to be granular

I think the panel have picked up on the same things that I have. I think we all agree that the work that Martha is doing is great, and leading by example – we need more experiments like this – actual practical stuff, not just theoretical

 

Technorati Tags:

ALA 2008: response to first two speakers

Tim Spalding: Would challenge Roy’s assertion that only OCLC can do ‘web scale’ or ‘do a deal with Google’ – if you put stuff on the web, you are webscale

Diane Hillman: Need open data

Roy Tennant: Worldcat has been built ‘by you’ – it is a collective asset – need to think about how it can be used – OCLC is a membership collective – if the membership decides to open it up, then they need to OCLC

Martha Yee: Cataloguing data is ‘gold’ – took intellectual effort, and mustn’t ‘throw it away’ (can tell it is a cataloguing audience – this the only point so far that has been applauded)

Robert Wolven: Need to think about how metadata flows around – what controls are needed etc.

If you find out about things that you can’t get, then it is frustration. But, you can usually get stuff if you really want. How do we decide on scope of what we present to people?

RT: We need to do better job of presenting options to users. Users should know how difficult it is to get their hands on something – this is a set of concentric circles – what is local, what is regional, what is further afield (delivery times – immediate, days, weeks, etc.) Also can present purchase options – may be quicker and cheaper than ILL to buy on Amazon.

JB: XC looking at a facet for ‘availability’ – so that users can easily narrow search by how quickly they can get stuff, or by the amount of effort involved to them. But need to be able to get this information out of the ILS – some vendors better than others to work with on this.

RW: Is there a ‘local user’ anymore?

JB: Local may not be the right word – but users that ‘your institution cares about’

DH: We often don’t define ‘local’ very well. About user relation to the collection – if you do digitised photos of a geographic area, the ‘local’ users include the people who lived in the area at the time the photos were taken, even if they are no longer ‘geographically’ local.

RW: Local is a matter of ‘interest’ as well as ‘placement’

TS: Centralisation has suppressed ‘the local’. What goes into the ‘local’ catalog record can be there for the ‘local’ use – doesn’t have to necessarily be shared by everyone (although can be shared as widely as you like)

RW: Should we look at aggregating data at the ‘instituional’ level – e.g. bringing in museums, archives etc. from the wider institution (e.g. a University) – what levels of aggregation make sense between the library and the ‘network level’

RT: We need to separate out ‘inventory’ from ‘access’ – library systems are currently inventory and we have confused it with a list of ‘accessible’ resources. We could be pointing people off to GBS and OCA digitised books etc.

DH: Massive amounts of digitised material – funded by Google, Microsoft etc. But most of it is not available to us OR we haven’t integrated it into our systems. Even the people who funded it don’t seem sure what to do with it. Perhaps they have underestimated the problems?

I think I disagree (speculatively) with Diane on some of this. I think the point of the Google work was ‘lets do it, and deal with the problems later’ – essentially, there is a bottom line belief by the people at Google is having this stuff digitised is better than not – as long as you believe this then you may as well get on with it. Possibly Google also need to say ‘having it digitised is better than not, and we believe there will be profit in it’ – I’m not convinced about this.

MY: Making point that libraries have cataloguing backlogs, and digitisation increases the problem

DH: Need to get stuff out in the stacks by whatever method – fast cataloging, publisher data etc. We can always go back to stuff it needs to be refined later. Need to get rid of the idea that items are only touched once

RW: Who should invest in preserving stuff? e.g. Internet Archive providing access to stuff that no one seems to ‘own’

Technorati Tags: