Penultimate session of the day – Sophia Ananiadou from NaCTeM (National Centre for Text Mining)
What is text mining? – takes us from text to knowledge.
- Yields precise knowledge nuggest from sea of infomration -> Knowledge Extraction
 - Extraction of ‘named entities’ – e.g. names of people, institution names, diseases, genes, etc. etc.
 - Diovery of concepts allows semantic annotation and enrichment of documents – improves information access (goes beyond index terms) and allows clustering and classification of documents
 - Extracts relationships, events and even opinions, attitudes etc. – for further semantic enrichment
 
Need a toolkit:
- Resources – lexica, grammars, ontologies, databases
 - Tools – parsers, taggers, named entity recognisers
 - Annotated corpora
 - Domain adaptation
 
Sophia talking in a bit more detail about how you go about doing text mining:
- Start with syntactic analysis
 - Use Named Entity Recognition to extract terms/semantic entities
 - Use parsers to extract other aspects – events, sentiments etc.
 
All this allows the creation of annotations – semantic metatdata.
Some examples of text mining applications:
- Kleio (http://www.nactem.ac.uk/software/kleio/)
 - Medie (http://www-tsujii.is.s.u-tokyo.ac.jp/medie/)
 - Facta (http://text0.mib.man.ac.uk/software/facta/main.html)
 
Sophia suggests we should be integrating ‘Language Technology’ into open and common e-research infrastructure to enable the use of text mining tools on the content. See U-Compare tool from NaCTeM – http://www.nactem.ac.uk/u-compare.php
Q & A
Q: (David Flanders) If I was a repository manager which tool would you recommend I play with first?
A: All of them! Need to work out what you want to do and pick appropriate tool