Guidelines recently published by Google for the discovery of science datasets help data providers to describe their datasets in a structured way using schema.org, enabling internet search engines to find and index rich metadata to better present scientific datasets. The published guidelines draw on the metadata specifications for life science datasets developed by BioSchemas. BioSchemas is an open community initiative driven by ELIXIR to improve interoperability of life science data.
One of the early adopters of the specifications is the Omics Discovery Index (OmicsDI), which has been presented as a good practice example in a recent Google Research Blog post. OmicsDI has been developed by EMBL-EBI and supported by BD2K, and is an active member of the BioSchemas community. It provides dataset discovery service across a heterogeneous, distributed group of -omics data from eight repositories across the world.
Building on and extending the schema.org markup, Bioschemas develop a collection of specifications that provide guidelines for describing metadata about life science information. Besides life science datasets, BioSchemas is working on specifications for samples, phenotypes, data repositories or proteins sequences.
To support the work of Bioschemas, ELIXIR has recently launched the BioSchemas Implementation study. The main partners in the study are BBMRI, BD2K and FORCE11, however, it has support of over 40 stakeholders. The BioSchemas group for life science datasets includes representatives from PDBe, UniProt, Pfam, DataMed and DATS, Repositive, OmicsDI, Intermine and Google.
Carole Goble, the Head of ELIXIR UK and one of the leaders of the Implementation study said: “Improved discoverability of data will encourage data re-use and sharing and I am delighted to see the growing momentum among so many institutions. Our goal is to bring together data providers, data users, domain experts and developers; BioSchemas as an open community of life science organisations, plays an important role in this effort.”
The BioSchemas Implementation study is led by Carole Goble, Alasdair Gray (ELIXIR UK) and Rafael Jimenez (ELIXIR Hub). Besides ELIXIR UK, the project also involves ELIXIR Nodes at EMBL-EBI and in Netherlands, Denmark, Sweden, Germany and Finland. The kick-off meeting will take place 6-8 March 2017 in Hinxton, UK.
The work of the BioSchemas community will also feed into the new Horizon 2020 project EOSCpilot (European Open Science Cloud pilot). One of the priorities of the project’s interoperability activities will be the findability of data; the goal is to build on BioSchemas results in the life-science domain and extend them to general scientific data types like datasets and samples.