Discovering Babel

(Martin Wynne presenting) Discovering Babel based on Oxford Text archive – looking at c.1400 metadata records and c.1400 electronic literary and linguistic datasets – electronic texts, text corpora, lexicons, audio data, etc.

Also including British National Corpus and an archive of central and East European language resources (TRACTOR)

Metadata will be made available via TEI (Text Encoding Initiative) XML headers; Dublin Core – but will use DC extensions from the Open Language Archives Community (OLAC); CLARIN Metadata Initiative (CMDI); RDF linked data (‘may as well’!)

Will provide an OAI-PMH target, and will be harvested by OLAC, CLARIN, etc.

Example use cases … want to provide data/metadata and allow others to build services on top of this…
Aim to make it easier for end-users to find and access resources; will also produce a ‘How to make your language resources discoverable’ manual.

Key technical challenges – establishing sensible and standards conformant architecture for resource file locations (persistent URLs) …

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.