The primary aim of SPQR is to build on this work and investigate the potential of a Linked Data approach for linking and integrating datasets related to classical antiquity, focusing on some targeted datasets as test cases. The project is driven by, and will address, the issues and limitations identified by our previous work in LaQuAT. Our work will cover the following areas:
- Data representation: How should we represent the information contained in (e.g.) relational database resources so that it can be exposed as LD? We will take into account existing LD recommendations, and also ontologies specific to cultural heritage and antiquity.
- Data transformation and exposure: The targeted datasets exist in various non-LD formats,e.g. relational databases, spreadsheets and XML. We will investigate mechanisms for transforming this into LD and exposing it.
- Data integration: How to provide integrated views across multiple heterogeneous datasets, allowing researchers to explore (browse) or search (query) within & across those datasets.
- Data exploration: A key issue raised by LaQuAT was the difficulty of providing a researcher with access to integrated datasets that allows them to explore and traverse these in an intuitive way, following paths though the data from one dataset to another via common attributes (e.g. names, places or dates). Provision of such functionality is a key desideratum.
- Data querying: Languages (specifically SPARQL) that allow LD sets to be searched for items of interest, rather than just being explored via link traversal
- Data linking/annotation: When publishing data as LD, researchers need to create links both within datasets and between datasets to provide a richer context of exploration. Moreover, a researcher may want to augment existing datasets with their own observations.
We will investigate ways to represent the information in our targeted datasets using RDF or equivalent formalism, taking particular account to address the semantic issues – incompleteness, uncertainty, fuzziness, etc. – identified by LaQuAT. To ensure interoperability with the wider information space, we will follow core standards, in particular the Europeana Data Model (EDM), which has been developed by the EU-funded Europeana project for modelling cultural heritage data, as well as OAI-ORE and emerging domain-specific ontologies and vocabularies. Ontologies form the centrepiece of the data integration project here, acting as semantic mediators for heterogeneous databases, which are mapped onto ontologies to provide semantic views over the datasets.
In parallel to this, we will investigate mechanisms for breaking this information out of its current silos, and transforming it from its legacy format (such as databases) into our chosen representation, and exposing it as LD. There are two broad approaches to this – wrappers for on-the-fly conversion, and converting data before exposing it – and we will assess the pros and cons of each. Moreover, the research community for classical antiquity is already developing a number of data resources with LD in mind resources that could provide additional “glue” for linking the datasets into a wider network of knowledge.
The ultimate objective will be to bring the transformed information into a common corpus or “RDF warehouse” where it can be explored and searched in an integrated way, and where new connections (corresponding to new RDF or similar statements) can be made by the researcher and added to the corpus of information.