Presentation from John W Miescher – from Bizgraphic (Geneva). He says a ‘well behaved document’ is an electronic document that is both user friendly and library friendly – easy to read and navigate – should have bookmarks and interactive table of content. So many long electronic documents that lack basic functions – and long reports rarely designed to be read cover-to-cover.
average information consumer interested in descriptive metadata and less in the structured and administrative metadata. They don’t care about semantics, namespaces and refinements. Dublin Core terms probably best option.
John says it isn’t that he is particularly a fan of DC – but it is there and it is convenient. However there are challenges – authors not very aware of it, not always completed, libraries use MARC21 and crosswalking to DC has limitations. But DC tags can be embedded into PDFs – but there are lack of decent tools for editing document metadata.
digi-libris a tool intended to help organize documents and collections – automatically scanning for metadata from documents, allows editing of metadata and then can re-embed metadata into the files – so anyone you pass the file onto benefits…
In summary – well-behaved documents
cater to the needs of (and empower) the information consumer
have a better chance of being found (in search engines)
[…. at this point had temporary outage when my battery died – plugged in now]
Some interesting points from the floor in the Q&A about changing the metadata in a PDF changing the checksum, and creating version/preservation problems – suggesting that integrating metadata into the document isn’t a good approach. I sympathise but tend to disagree – why not integrate into the document – the description and the thing together makes sense as we deal with more digital docs…
But… I think there are real issues around the nature of ‘documents’ – it’s a print/physcial paradigm, and not sure how far it applies as we move to more digital content. I also felt the emphasis on pdfs in the presentation was worrying – I asked about this, and speaker emphasised the work that he does covers Epubs and HTML docs as well – but HTML more difficult….
Would have liked to ask him about tools like Mendeley and Zotero etc. that extract metadata from PDFs and Mendeley that provides reading functionality as well.
Suspect the issue is tying up content with other aspects of the ‘document’ – why should ‘table of contents’ or bookmarking be something ‘baked in’ to the document? Need to think about how content separate from metadata separate from functionality etc. Got me thinking anyway 🙂