{"id":124,"date":"2008-06-10T11:28:01","date_gmt":"2008-06-10T18:28:01","guid":{"rendered":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/?p=124"},"modified":"2008-06-10T11:28:01","modified_gmt":"2008-06-10T18:28:01","slug":"talis-research-day-codename-xiphos","status":"publish","type":"post","link":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/2008\/06\/talis-research-day-codename-xiphos\/","title":{"rendered":"Talis Research Day &#8211; Codename Xiphos"},"content":{"rendered":"<p>I&#8217;m at a Talis Research Day today looking at a number of issues that are &#8216;hot topics&#8217; in education and research at the moment. The program for the day looks great, with presentations by Peter Murray-Rust from the University of Cambridge, who is a proponent of opening up research and research data &#8211; I&#8217;d recommend his blog to catchup with the latest work in this area. Peter is talking about &#8216;Data-driven research&#8217;. Following this Andy Powell from Eduserv is talking about Web 2.0 and repositories.<\/p>\n<p>First up &#8211; Peter M-R:<\/p>\n<p>Peter presents using HTML &#8211; although it&#8217;s hard work, he believes that the common alternatives (Powerpoint, PDF) destroy data. I think the question of &#8216;authoring tools&#8217; &#8211; not just for presentations, but in a more general sense of tools that help us capture data\/information &#8211; is going to come to the fore in the next few years.<\/p>\n<p>Peter has a go at publishers &#8211; claiming that publishers are in the business of preventing access to data, rather than facilitating it (at this points asks if there are any publishers in the audience &#8211; two sheepish hands are raised). Peter, also mentioning that Chemistry is particularly bad as a discipline in terms of making data accessible &#8211; with the American Chemical Society being real offender.<\/p>\n<p>Peter&#8217;s talk tend to be pretty impromptu &#8211; so he is just listing some topics he may (or not) touch on today:<\/p>\n<ul>\n<li>Why data matters\n<li>What is Open Data\n<li>Differences between Open Access and Open data\n<li>Demos\n<li>Repositories\n<li>eTheses\n<li>OpenNoteBook Science\n<li>Semantic data and the evils of PDF\n<li>Scinec Commons, Talis and the OKF\n<li>Possible Collaborations<\/li>\n<\/ul>\n<p>Peter demonstrating how a graph without metadata is meaningless &#8211; showing a graph on the levels of Atmospheric Carbon Dioxide. If this was in paper form and we wanted to do some further analysis &#8211; it would take a lot of effort to take measurements off the graph &#8211; but if we have the data from behind the graph, we can immediately leap to doing further work.<\/p>\n<p>Peter now noting that a scholarly publication looks very much now as it would have done 200 years ago. Showing a pdf of an article from Nature &#8211; and making the point that all looks great (illustrations of molecules, proteins and reactions etc.) but completely inaccessible to machines.<\/p>\n<p>Peter noting that most important bio-data that is published is publicly accessible and reusable &#8211; but this is not true in chemistry. This means in the article, the data about the proteins is publicly accessible, but the information on chemical molecules is not &#8211; although covered in the same article.<\/p>\n<p>Peter illustrating how there is a huge industry based on moving and repurposing data (e.g. taking publicly available patent data, and re-distributing in other formats etc.)<\/p>\n<p>Peter now showing how a data rich graph is reduced to a couple of data points to &#8216;save space&#8217; in journals &#8211; a real paper-based paradigm &#8211; we need to get away from this. Similarly experimental protocols are reduced to condensed text strings.<\/p>\n<p>Peter now showing &#8216;JoVE&#8217; &#8211; Journal of Visualised Experiments. In this online publication where scientific protocols are published in both textual and audio-visual format&nbsp; &#8211; so much richer in detail than the type of summarisation that journals currently support. Peter notes &#8211; this is really important stuff &#8211; failure to provide enough detail to recreate an experiment, it can have a huge impact on your reputation and career.<\/p>\n<p>Peter now moving onto &#8216;big science&#8217; &#8211; relating his visit to CERN &#8211; how the enormous amounts of data generated by the Large Hadron Collider is captured, as well as relevant metadata. However, most science is not like this &#8211; not on this scale. Peter is relating the idea of &#8216;long tail&#8217; science (coined by Jim Downing) &#8211; this is the small scale science, that is still generating (over all activity) large amounts of data &#8211; but each from small activities. This is really relevant to me, as this is exactly the discussion I was having at Imperial yesterday &#8211; looking at the approach taken by &#8216;big science&#8217; and wondering if it is applicable to most of the research at Imperial.<\/p>\n<p>So in Long-tailed Science, you may have a &#8216;lab&#8217; that will have a reasonably &#8216;loose&#8217; affiliation to the &#8216;department&#8217; and &#8216;institution&#8217;. Peter noting that most researchers have experience data-loss &#8211; and this can be a real selling point for data and publication repositories.<\/p>\n<p>Peter showing a thesis with many diagrams of molecules, graphs etc. Noting there is no way to effectively extract the information about molecules from the paper, as it is a PDF. He is demonstrating a piece of software which extracts data from a chemical thesis &#8211; demonstrating this from a thesis authored in Word, and using OSCAR (a text-mining tool tuned to work in Chemistry) &#8211; and shows how it can extract relevant chemical data, can display it in a table, reconstruct spectra (from the available data in the text &#8211; although these are not complete).<\/p>\n<p>Peter asking (rhetorically) what are the major barriers &#8211; e.g. Wiley threatened legal action against a student who put graphs on their website.<\/p>\n<p>Peter now demonstrating &#8216;CrystalEye&#8217; &#8211; a system which spiders the web for crystals &#8211; reads the raw data, draws a &#8216;jmol&#8217; view (3d visualisation) of the structure, links to the journal article etc. This brings together many independent publications in a single place showing crystal structures. Peter saying this could be done across chemistry &#8211; but data is not open, and there are big interests that lobby to keep things this way (specifically mentioning Chemical Abstracts lobbying the US Government)<\/p>\n<p>Peter now talking about development of authoring tools &#8211; pointing out that this is much more important that a deposition tool &#8211; if the document\/information is authored appropriately, it is trivial to deposit (it occurs to me that as long as it is on the open web, then deposit is not the point &#8211; although there is some question of preservation etc &#8211; but you could start to take a &#8216;wayback machine&#8217; type approach). Peter is demonstrating how an animated illustration of chemical synthesis can be created from the raw data.<\/p>\n<p>Peter now coming on to Repositories. Using &#8216;Sourceforge&#8217; (and computer code repository) as an example. Stressing the importance of &#8216;versioning&#8217; within Sourceforge &#8211; trivial to go back to previous versions of code. Need to look at introducing these tools for science. He is involved in a project called &#8216;Bioclipse&#8217; &#8211; a free, open-source, workbench for chemo- and bioinformatics using a &#8216;sub versioning&#8217; approach (based on Eclipse which is a software subversioning package) &#8211; Bioclipse stores things like spectra, proteins, sequences, molecules etc.<\/p>\n<p>Peter mentioning issues of researchers not wanting to share data straightaway &#8211; we need &#8216;ESCROW&#8217; systems that can store information which is only published more openly at a later date. The selling point is keeping the data safe.<\/p>\n<p>Peter dotting around during the last few minutes of the talk, mentioning:<\/p>\n<ul>\n<li>Science Commons (about customising Creative Commons philosophy for Science)\n<ul>\n<li>how to license data to make it &#8216;open&#8217; under appropriate conditions &#8211; this is something that Talis has been working on with Creative Commons.\n<li>Peter saying that, for example, there should be a trivial way of watermarking images so that researchers can say &#8216;this is open&#8217; &#8211; and then if it is published, it will be clear that the publisher does not &#8216;own&#8217; or have copyright over the image.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Questions:<\/h3>\n<p>Me: Economic costs of capturing data outside &#8216;big science&#8217;<\/p>\n<p>PMR: If we try to retro-fit costs are substantial. However, data capture can be marginal cost if done as part of research. Analogy of building motorways and cyclepaths &#8211; very expensive to add cyclepaths to motorways, but trivial to build at the same time.<\/p>\n<p>Some interesting discussion of economics&#8230;<\/p>\n<div class=\"wlWriterSmartContent\" id=\"scid:0767317B-992E-4b12-91E0-4F059A8CECA8:25f25ac8-aa64-40f3-8b20-5944c2327951\" style=\"padding-right: 0px; display: inline; padding-left: 0px; padding-bot\ntom: 0px; margin: 0px; padding-top: 0px\">Technorati Tags: <a href=\"http:\/\/technorati.com\/tags\/xiphos\" rel=\"tag\">xiphos<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m at a Talis Research Day today looking at a number of issues that are &#8216;hot topics&#8217; in education and research at the moment. The program for the day looks great, with presentations by Peter Murray-Rust from the University of Cambridge, who is a proponent of opening up research and research data &#8211; I&#8217;d recommend [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[11],"class_list":["post-124","post","type-post","status-publish","format-standard","hentry","tag-xiphos"],"_links":{"self":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/124","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/comments?post=124"}],"version-history":[{"count":0,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/posts\/124\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/media?parent=124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/categories?post=124"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.meanboyfriend.com\/overdue_ideas\/wp-json\/wp\/v2\/tags?post=124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}