Highlights from ELIXIR-NL at the BioHackathon 2020

21 ELIXIR-NL participants actively contributed to the ELIXIR BioHackathon which took place online on 9-13 November 2020. In the Netherlands, partners within DTL’s network constitute ELIXIR-NL (the Dutch Node in ELIXIR), with DTL taking the coordinating role.

Several highlights of Dutch participation in the different Biohackathon projects were:

  • Improvements of the Data Stewardship Wizard (DSW)– the DSW now includes a Horizon 2020 DMP template for all of its users. A generation of the Horizon 2020 DMP is now possible based on the existing questions and answers in the wizard. During the Hackathon many suggestions for improvements of the knowledge model have been implemented. And an analysis was made of what needs to be done for the wizard to be translated into different languages (and thus can be used internationally) (ELIXIR-NL and ELIXIR-CZ took the lead in this activity)
  • Defining and extending EDAM with data stewardship terms – Participants from different tracks got together to discuss the need for Data Stewardship related terms in EDAM Ontology. This work will be followed up in the ELIXIR CONVERGE project. 
  • First steps towards a ‘Learning path in data management, stewardship and analysis’-  over 20 participants from inside and outside ELIXIR helped to identify topical areas and to define knowledge, skills and abilities in data management, data stewardship and data analysis. This is the first step towards the development of a tailored learning path which will be part of a curriculum currently developed in the ELIXIR CONVERGE project.
  • Connecting chemistry databases– in this cheminformatics project participant’s aim was to connect more chemistry databases. ELIXIR-NL and ELIXIR-UK worked on scraping Bioschemas annotation of molecular entities from the MassBank Europe database, and added more than 20 thousand Protein Data Bank (PDB) ligand identifiers to Wikidata.
  • A Wikidata subsetting using ShEx. As a scopeless knowledge graph, Wikidata is a very valuable source for cross-domain research. However, an ever growing volume of knowledge is leading to various scalability issues. In this project we worked towards a pipeline to extract topical subsets from Wikidata. We used a formal language called Shape Expressions (ShEx) and were able to extract various subsets from Wikidata.   
  • Implementing a ShEx schema for chemicals in Wikidata. A draft visualisation of linked properties has been achieved and ShEx will be finalised during the upcoming weeks. The final scheme can be used to retrieve a chemical subset of Wikidata (which will allow users to write more elaborate queries specifically for chemicals, without getting timeout results).
  • Developing a Galaxy-based workflow to visualize results-  In this Hackathon project a Galaxy-based workflow was created which can take transcriptomics (RNAseq) data, download gene sets from the COVID-19 Disease Map project and WikiPathways, perform gene set enrichment analysis and visualize the results in interactive web pages.
  • Creating a biodiversity ontology and integration of metabolic pathway ontologies– participants in this project worked on a biodiversity ontology. They further took first steps towards an integration of metabolic pathway ontology data. This should facilitate the evaluation whether a microbiome is likely able to produce a specific compound (e.g. a disaccharide able to cause IBD flares as presented at the latest BioSB conference). This work will be picked up in ELIXIR communities on Food & Nutrition and/or Microbiome.
  • Copy number variation tools in Galaxy. This will be linked to usage in EJP rare disease for neurological deletion syndromes (like 22q11).

