The DTL FAIR Data team has developed tools that compose the so-called ‘Data FAIRport’:
- FAIRifier and Metadata Editor (to create)
- FAIR Data Point (to publish)
- FAIR Search Engine (to find)
- ORKA (to annotate)
At this moment, the FAIR Data Point is being used in many projects and under active development. The other components of this ecosystem have been prototyped in earlier projects and are currently not under active development (but still considered for the future).
The data FAIRifcation process includes
- Original data retrieval
- Dataset identification and analysis
- Definition of the semantic model
- Data transformation
- License assignment
- Metadata definition
- FAIR Data resource deployment (data, metadata, license)
Currently, this process is done manually, which limits its scalability. An automation of (part of) the process is possible. A skilled programmer could create a workflow of customised scripts to automate some of the repetitive FAIRification tasks. However, this approach may lead to inconsistency as different developers would create different scripts and, potentially, introduce modifications in the process. In order to support data FAIRification, we are working on the FAIRifier, as described in the section below.
The FAIRifier is an online software tool designed to address the commonly encountered problems and data-manipulation tasks in the FAIRification process. The FAIRifier can thus speed-up the process of data FAIRification, especially for larger datasets.
The FAIRifier is a complex application that allows the user to mash together data and metadata, data license, the data model, and the chosen ontologies and identifiers. The FAIRifier also allows users to directly publish data on a FAIR Data Point (FDP). The FAIRifier is an augmentation of the OpenRefine tool originally developed by Google. We chose OpenRefine because it can be extended with the functionality required to support the FAIRification process. With the OpenRefine RDF plugin, now incorporated in the FAIRifier, users can map data to any type of RDF model, which is the key task of the FAIRification process.
The Metadata Editor (MDE) is a software tool that makes it easy for non-technical users to define and publish the metadata required by a FAIR Data Point (FDP).
We have defined a 5-layered metadata schema:
- Repository Metadata – record containing information about the FAIR Data as a data repository.
- Catalogue Metadata – record containing information about the data catalogue(s), i.e., the collections of datasets.
- Dataset Metadata – record containing information about each individual dataset.
- Distribution Metadata – record containing information about each dataset’s distributions.
- Data Record Metadata – record containing information about the dataset’s record, i.e., the internal structure, types, their relations (the semantic model).
The MDE helps you create the metadata for the FDP. Its web interface allows you to fill or edit a simple form. As you fill in the form, the MDE will build and display the RDF representation of your metadata.
Some basic definitions regarding the MDE and FDP:
- A Catalogue is an arbitrary grouping of metadata of datasets.
- A Dataset is a collection of data, published or curated by agents, and available for access or download in one or more formats. A dataset does not have to be available as a downloadable file. For example, a dataset that is available via an API.
- A Distribution represents a specific representation of the dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API, or an RSS feed.
- The Data Record metadata represents the internal characteristics of the dataset, i.e., its internal structure, the types involved, their relationships, and the domain and range of the items.
A FAIR Data Point (FDP) is software that allows data owners to expose metadata and data in a FAIR manner. It offers a graphical user interface (GUI) for human clients and a very simple application programming interface (API) for software clients.
FDPs make datasets and their fine-grained metadata discoverable and accessible by machines. The datasets can be external or internal to the FAIR Data Point. You can look at a FAIR Data Point as a web site for research data that is not only accessible for human beings, but also for machines.
The FAIR Data Search Engine harvests the metadata available on FAIR Data Points or compatible data repositories, indexes them, and provides a search interface.
The Open, Reusable Knowledge graph Annotator (ORKA) supports easy human curation of knowledge graphs by offering graph annotation as a service and capturing the provenance of the annotator and the original statement. The ORKA prototype has been developed in the context of the ODEX4all project.
The Data FAIRport is an interoperability platform that allows data owners to publish their (meta)data and allows data users to search for and access data (if licenses allow).
In the Data FAIRport, the embedded FAIR Data Points provide the relevant metadata to be indexed by the Data FAIRport’s data search engine as well as the accessibility to the data. When data owners publish their non-FAIR datasets, the embedded FAIRifier will transform the dataset into a FAIR dataset before its actual publication in the Data FAIRport.