{"id":1806,"date":"2015-07-15T12:28:20","date_gmt":"2015-07-15T11:28:20","guid":{"rendered":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/?p=1806"},"modified":"2015-11-29T17:59:41","modified_gmt":"2015-11-29T16:59:41","slug":"worked-example-fixing-marc-data-3","status":"publish","type":"post","link":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2015\/07\/worked-example-fixing-marc-data-3\/","title":{"rendered":"A worked example of fixing problem MARC data: Part 3 &#8211; MarcEdit"},"content":{"rendered":"<p>This is the third\u00a0post in a\u00a0<a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/tag\/fixmarc\/?orderby=date&amp;order=ASC\">series of 5<\/a>.<\/p>\n<p>In\u00a0<a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2015\/07\/worked-example-fixing-marc-data-2\/\">Part 2<\/a>\u00a0I describe how I used a text editor to get a malformed file to the point where it could be read as a MARC file by MarcEdit.\u00a0I knew that there would still be many issues in the file at this point, because I&#8217;d spotted them in my <a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2015\/07\/worked-example-fixing-marc-data-1\/\">initial investigation<\/a>, and when <a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2015\/07\/worked-example-fixing-marc-data-2\/\">editing the file in a text editor<\/a>\u00a0&#8211; but I wanted to get a more structured list of the issues and happily the <a href=\"http:\/\/marcedit.reeset.net\">MarcEdit software<\/a>\u00a0has an option to validate files.<\/p>\n<p>Like several other functions in MarcEdit, the &#8216;Validate MARC Records&#8217; option can be accessed both from the MarcEdit opening screen, and from with the MarcEdit editor. To access the validation option without going through the editor look in the &#8216;Add-ins&#8217; menu:<a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit4.png\"><br \/>\n<\/a> <a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit3.png\"><br \/>\n<\/a> <a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1817\" src=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit2.png\" alt=\"MarcEdit Add-ins menu\" width=\"334\" height=\"308\" srcset=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit2.png 493w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit2-300x277.png 300w\" sizes=\"auto, (max-width: 334px) 100vw, 334px\" \/><\/a><\/p>\n<p>However, first I wanted to make sure that the file would open OK in the MarcEdit Editor, and see how it looked, so I used the &#8216;MarcEditor&#8217; option and opened my file:<\/p>\n<p><a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1814\" src=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit5.png\" alt=\"marcedit5\" width=\"397\" height=\"277\" srcset=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit5.png 557w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit5-300x210.png 300w\" sizes=\"auto, (max-width: 397px) 100vw, 397px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>The layout of the MARC record in the editor is much easier to read than the native MARC format &#8211; for comparison:<\/p>\n<p><a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcrecord.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-1818\" src=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcrecord-1024x106.png\" alt=\"Example of a MARC Record\" width=\"640\" height=\"66\" srcset=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcrecord-1024x106.png 1024w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcrecord-300x31.png 300w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcrecord-900x93.png 900w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcrecord.png 1209w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>The MarcEditor layout is called &#8216;<a href=\"http:\/\/www.loc.gov\/marc\/mnemonics.html\">mnemonic format<\/a>&#8216; (for what seem like slightly obscure reasons to be honest, and as far as I can tell relate back to the origins of this format in the <a href=\"http:\/\/www.loc.gov\/marc\/makrbrkr.html\">Library of Congress MarcMaker and MarcBreaker software<\/a>)<\/p>\n<p>The layout of this mnemonic format is reasonably easy to read if you are used to MARC records &#8211; each line contains:<\/p>\n<ul>\n<li>an equals sign as the first character on the line<\/li>\n<li>a three digit (or letter in the case of LDR) MARC field code<\/li>\n<li>two spaces<\/li>\n<li>two MARC field indiciators<\/li>\n<li>the content of the field &#8211; with subfields included where appropriate using the syntax &#8216;$&#8217; followed by the subfield code<\/li>\n<\/ul>\n<p>Even from a very brief examination of the MARC record in the editor I can immediately see there is a problem with the fixed fields\u00a0(LDR, 001-009 fields) in that they all start with a subfield:<\/p>\n<p><a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit52.png\"><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-1820 alignnone\" src=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit52.png\" alt=\"Illustration showing problem with Indicators on fixed fields\" width=\"127\" height=\"82\" \/><\/a><\/p>\n<p>I can also see that I&#8217;ve got a &#8216;002&#8217; field containing what seems to be a system number &#8211; which I&#8217;d expect to be in &#8216;001&#8217;.<\/p>\n<p>I then ran the Validate Marc Records\u00a0function, which can be accessed from inside the MarcEditor through the &#8216;Tools&#8217; menu:<\/p>\n<p><a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1813\" src=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit6.png\" alt=\"How to access the Marc Validator from the MarcEditor\" width=\"412\" height=\"521\" srcset=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit6.png 569w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit6-237x300.png 237w\" sizes=\"auto, (max-width: 412px) 100vw, 412px\" \/><\/a><\/p>\n<p>When you choose this option, you are prompted for the &#8216;rules&#8217; file you want to use:<\/p>\n<p><a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1816\" src=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit3.png\" alt=\"Choose rules file for Marc Validator\" width=\"331\" height=\"181\" srcset=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit3.png 435w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit3-300x164.png 300w\" sizes=\"auto, (max-width: 331px) 100vw, 331px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>MarcEdit comes with a ready made rules file for validating MARC &#8211; but you can modify this, or design your own validation file if you have specific things you want to validate (or ignore) in different types of file.<\/p>\n<p>(n.b. in the\u00a0illustration above shows the option to\u00a0select\u00a0the &#8216;source&#8217; file &#8211; that is the file you want to validate. This isn&#8217;t an option when using the Marc Validator from the Marc Editor, as it will always validate the file you are viewing in the editor. However when you access the Marc Validator directly without going via the Editor, you will be asked which file you want to validate)<\/p>\n<p>There are different options for the record validation process, but in this case I want to use the default &#8216;Validate Record&#8217; option. When I click &#8216;OK&#8217;, the validator runs (this will take a while on a large file) and then displays the results which in my case looked something like:<\/p>\n<p><a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-1812\" src=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit7.png\" alt=\"Marc Validator results\" width=\"289\" height=\"230\" srcset=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit7.png 471w, http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-content\/uploads\/2015\/07\/marcedit7-300x239.png 300w\" sizes=\"auto, (max-width: 289px) 100vw, 289px\" \/><\/a><\/p>\n<p>I can see now that my problems extend beyond the fixed field problems I identified by eye &#8211; I&#8217;ve got all kinds of problems with incorrect indicators (and many other problems not shown in this screenshot).<\/p>\n<p>I used the clipboard icon to copy these results into the clipboard and pasted them into a text file so I could refer back to them.<\/p>\n<p>At this point I have a\u00a0file of MARC records that will at least open in MarcEdit. I also have a list of issues with the MARC records in the file. I now want to start fixing these errors. Of course I could start fixing these issues directly in MarcEdit, and there are some tools and approaches in MarcEdit that might help me &#8211; but with this volume of issues over a file of 50,000 records I&#8217;m not sure MarcEdit is the right tool.<\/p>\n<p>Instead I&#8217;m going to use another tool to start fixing the records &#8211; a tool called <a href=\"http:\/\/openrefine.org\">OpenRefine<\/a> which is designed specifically to help &#8216;fix messy data&#8217;. I&#8217;m a big fan of OpenRefine and use it a lot, so for me it is the obvious tool to use for this task.<\/p>\n<p>However, OpenRefine doesn&#8217;t understand MARC records. It can use XML, and so converting to MARCXML might be one approach I could use &#8211; but to be honest I don&#8217;t think it is the right approach in this case, and I suspect trying to fix MARCXML in OpenRefine would be a very painful process.<\/p>\n<p>Instead I&#8217;m going to use the &#8216;mnemonic&#8217; format that is used by the MarcEdit editor. There are two ways of converting a MARC file into the mnemonic format\u00a0in MarcEdit. You can use the &#8216;MARC Breaker&#8217; function which can be accessed from the MarcEdit opening screen, or (and this is the approach I took) once you have a file open in the MARC editor you can simply save it in the Mnemonic format simply by using the File-&gt;Save option from the File menu. The mnemonic format is designated in Marc Edit by the &#8216;mrk&#8217; file extension (as opposed to &#8216;mrc&#8217; which designates a proper\u00a0aka &#8216;compiled&#8217; aka &#8216;binary&#8217; MARC file). &#8216;mrk&#8217; files are simple text files, and can be opened in any text editor, and happily also in OpenRefine.<\/p>\n<p>I now have a file of errors (from the MARC Validator) and my MARC records in mnemonic format &#8211; the next step is to open the files in OpenRefine so I can see all of the different types of error that I need to fix and start to fix them &#8211; which I&#8217;ll describe in <a href=\"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2015\/07\/worked-example-fixing-marc-data-4\/\">Part 4 of this series<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the third\u00a0post in a\u00a0series of 5. In\u00a0Part 2\u00a0I describe how I used a text editor to get a malformed file to the point where it could be read as a MARC file by MarcEdit.\u00a0I knew that there would still be many issues in the file at this point, because I&#8217;d spotted them in [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[101,102],"class_list":["post-1806","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-fixmarc","tag-openrefine"],"_links":{"self":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1806","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/comments?post=1806"}],"version-history":[{"count":5,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1806\/revisions"}],"predecessor-version":[{"id":1842,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/1806\/revisions\/1842"}],"wp:attachment":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/media?parent=1806"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/categories?post=1806"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/tags?post=1806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}