{"id":1791,"date":"2015-07-13T11:01:44","date_gmt":"2015-07-13T10:01:44","guid":{"rendered":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/?p=1791"},"modified":"2015-11-29T17:59:41","modified_gmt":"2015-11-29T16:59:41","slug":"worked-example-fixing-marc-data-1","status":"publish","type":"post","link":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2015\/07\/worked-example-fixing-marc-data-1\/","title":{"rendered":"A worked example of fixing problem MARC data: Part 1 &#8211; The Problem"},"content":{"rendered":"<p>In what will eventually be <a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/tag\/fixmarc\/?orderby=date&amp;order=ASC\">a\u00a0series of 5\u00a0posts<\/a>\u00a0(I think)\u00a0I&#8217;m going to walk through a real life example of some problematic MARC records I&#8217;ve been working with using a combination of three tools (the <a href=\"https:\/\/notepad-plus-plus.org\">Notepad++ text editor<\/a>, <a href=\"http:\/\/marcedit.reeset.net\">MarcEdit<\/a> and <a href=\"http:\/\/openrefine.org\">OpenRefine<\/a>). I want to document this process partly because I hope it will be useful to others (including future me) and partly because I&#8217;m interested to know if I&#8217;m missing some tricks here. I&#8217;d like to thank the <a href=\"http:\/\/www.polytechnic.edu.na\/?q=library\">Polytechnic of Namibia Library<\/a> for giving\u00a0me permission to share this example.<\/p>\n<p>This is the first post in the series, and describes the problem I was faced with&#8230;<\/p>\n<p>I was recently contacted by a library who were migrating to a new library system but they&#8217;d hit a problem. When they came to export MARC records from their existing system, it turned out that what they got wasn&#8217;t valid MARC, and wouldn&#8217;t import into the new system.<\/p>\n<p>I agreed to take a look and based on a sample of 2000 records found the following problems:<\/p>\n<div class=\"page\" title=\"Page 2\">\n<div class=\"section\">\n<div class=\"layoutArea\">\n<div class=\"column\">\n<ul>\n<li>Missing indicators \/ indicators added in the incorrect place within the field, rather than preceding the field<\/li>\n<li>Incorrect characters used to indicate &#8216;not coded\/no information&#8217; in MARC field indicators<\/li>\n<li>Subfields appearing in fixed length fields<\/li>\n<li>Use of invalid subfield codes (in particular &#8216;_&#8217;)<\/li>\n<li>System number incorrectly placed in 002 field, rather than 001 field<\/li>\n<li>Several issues with the MARC record leader (LDR) including:\n<ul>\n<li>Incorrect characters used to indicate &#8216;not coded\/no information&#8217;<\/li>\n<li>Incorrect character encoding information (LDR\/09)<\/li>\n<li>Incorrect characters in &#8220;Multipart resource record level&#8221; (LDR\/19)<\/li>\n<li>Incorrect characters in &#8220;Record status&#8221; (LDR\/05)<\/li>\n<li>Incorrect characters in &#8220;Bibliographic level&#8221; (LDR\/07)<\/li>\n<li>Incorrect characters in &#8220;Encoding level&#8221; (LDR\/17)<\/li>\n<li>Incorrect characters in &#8220;Descriptive cataloging form&#8221; (LDR\/18)<\/li>\n<li>Incorrect characters in &#8220;Multipart resource record level&#8221; (LDR\/19)<\/li>\n<li>Incorrect characters in &#8220;Length of the implementation-defined portion&#8221; and &#8220;Undefined&#8221; (LDR\/22 and LDR\/23)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>At this point I felt I had a pretty good view of the issues, and agreed to fix the records to the point they could be successfully loaded into new library system &#8211; making it clear that:<\/p>\n<ul>\n<li>It wouldn&#8217;t be possible\u00a0improve the MARC records beyond the data\u00a0provided to me<\/li>\n<li>That where there was insufficient data in the export to improve the MARC records to the extent they are valid, I&#8217;d use a &#8216;best guess&#8217; on the appropriate values in order to make the records valid MARC<\/li>\n<li>That I wouldn&#8217;t be trying to improve the cataloguing data itself, but only to correct the records to the point they were\u00a0valid MARC records<\/li>\n<\/ul>\n<p>At this point the library sent me the full set of records they needed correcting &#8211; just under 50k records. Unfortunately this new file turned up an additional problem &#8211; \u00a0that incorrect &#8216;delimiter&#8217;, &#8216;field terminator&#8217; and &#8216;record terminator&#8217; characters had been used in the MARC file &#8211; which meant (as far as I could tell) that MarcEdit (or code libraries like PyMARC etc.) wouldn&#8217;t recognise the file as MARC at all.<\/p>\n<\/div>\n<p>So I set to work &#8211; my first task was to get to the point where MarcEdit could understand the file as MARC records, and for that I was going to need a decent text editor as I&#8217;ll describe in <a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2015\/07\/worked-example-fixing-marc-data-2\">Part 2<\/a>&#8230;<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In what will eventually be a\u00a0series of 5\u00a0posts\u00a0(I think)\u00a0I&#8217;m going to walk through a real life example of some problematic MARC records I&#8217;ve been working with using a combination of three tools (the Notepad++ text editor, MarcEdit and OpenRefine). I want to document this process partly because I hope it will be useful to others [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[101,102],"class_list":["post-1791","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-fixmarc","tag-openrefine"],"_links":{"self":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1791","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/comments?post=1791"}],"version-history":[{"count":8,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1791\/revisions"}],"predecessor-version":[{"id":1805,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1791\/revisions\/1805"}],"wp:attachment":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/media?parent=1791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/categories?post=1791"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/tags?post=1791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}