FAIR data management planning

Discussions about efficient use of funding for data-intensive research have resulted in the definition of the FAIR principles: scientific data should become Findable, Accessible, Interoperable, and Reusable, for both humans and computers. You can also use the FAIR principles to guide your data management planning.

Data Re-use obviously requires that the data can be Found and Accessed by others. In addition, both your own research and that of others that will re-use your data will benefit if the data can easily be coupled to related data. This means you should make your data Interoperable. For example, two data sets that both use a list of diseases should use the same vocabulary for these diseases. Similarly, two data sets that describe events at a specific location should use the same method to describe that location.

The FAIR principles provide excellent handles for data management planning:

To ensure Findability,

  • select a data repository at an early stage and check out its data format and metadata requirements;
  • make sure the data can get a persistent identifier so that it can be cited;
  • maybe select a catalogue to make your data more findable, especially if the repository is more generic in nature.

To ensure Accessibility,

  • guarantee longevity of the data (e.g., by submitting it to a repository that has a certification like the Data Seal of Approval or an ISO certification);
  • check and describe the legal conditions under which the data can be made available (this is generally easier to do before you have collected and interpreted the data);
  • establish an embargo period if necessary;
  • make sure your ICT infrastructure will keep the data available even in case of equipment failure or human error.

To ensure Interoperability,

  • select commonly used data formats;
  • select commonly used vocabularies for data items.

To ensure Reusability,

  • make sure you keep proper provenance information (i.e., details about how and where the data was generated, including machine settings, and details about all processing steps, such as the software tools with their versions and parameters);
  • select the right minimal metadata standard and collect the necessary metadata (many minimal metadata standards are included in ELIXIR’s biosharing.org repository);
  • select a license for the data (preferably an open license) and the associated software tools;
  • make sure the important conclusions of your study will not only be available in a paper in narrated form, but also in a digital file (e.g., a nanopublication).