On 11 September 2017, the Centraal Museum in Utrecht hosted the very first Data4lifesciences symposium. D4LS Programme Manager Dr Jan-Willem Boiten: “This symposium was an important milestone for the programme; we presented an overview of our first results.”
The NFU programme Data4lifesciences (D4LS) aims to realise an integrated research data infrastructure for biomedical research at the Dutch UMCs. DTL is actively involved in D4LS. The popularity of the symposium reflects the timeliness of the programme. Boiten: “The number of registrations exceeded the capacity of the room in the Centraal Museum, so we had to prematurely close registration. We ended up with 160 registrations from 48 institutions, representing a proper mix of researchers and IT specialists.”
Why it all began
At the launch of D4LS in 2014, the initiatives related to biomedical research data were scattered over the Netherlands. Boiten: “There was a clear need for concerted action because the challenge to connect these fragmented initiatives was too large for a single UMC. We defined a ‘true north’: all clinical data of all patients will be made available by each UMC for any valid research question, in a safe and responsible way. We established eight work packages (WPs), dedicated to FAIR data stewardship (Work packages 1, 6, and 9), supporting technical services (Work packages 2 and 7), reuse of data & samples (Work packages 3 and 5), and a digital research environment (Work package 4). The symposium featured the first results of several work packages, as well as examples of success stories and hurdles.”
Learning care system
As an inspiring example of what can be achieved by connecting data resources, Professor Edwin Cuppen (UMCU) gave a keynote lecture on the role of genome data in personalised cancer treatment. The first examples of personalised cancer care based on genomic biomarkers are already applied in clinical practice. “However, for 90-95% of cancer patients, no biomarker-based therapy is available yet. To improve this, we should systematically combine clinical and genomic data in a central database and create a learning care system. This truly is another way of thinking, financing, and working: organising the best care for today’s AND tomorrow’s patient,” said Cuppen.
Cuppen and his team started the Hartwig Medical Foundation (HMF) to develop such a learning care system. “Forty-three out of 80 Dutch hospitals participate in the HMF. We have already collected more than 2000 tumour biopsies and we are currently receiving around 40 biopsies of metastatic cancer patients per week. We genotype these with state-of-the-art whole genome sequencing technologies and bioinformatics analyses. The genomic data are combined with clinical information and fed into the HMF cancer genomics database, which currently is one of the largest whole genome sequencing databases of metastatic cancer in the world.”
The HMF cancer genomics database is coupled to a data access board that allows treating medical specialists and researchers to work with the data if the objected usage fits with the consent given by the patient. Molecular tumour summary sheets are produced to help clinicians understand the clinical relevance of the sequencing data, by suggesting approved and experimental therapies for the patient. Cuppen and his team collaborate with several D4LS work packages: “For instance, we are working on a portal that brings researchers to the data instead of distributing privacy-sensitive information to research systems around the world.”
Bringing researchers to the data
Professor André Dekker (UMCM+) leads D4LS work package 5, which aims to make clinical data from the UMCs available to researchers across the country, while fully protecting patient privacy. Dekker: “It is important to share data across organisations, but there are administrative, political, and ethical barriers to this. We figured: ‘if sharing is the problem: don’t share the data’. In the personal health train project, which is intertwined with D4LS work package 5, we are developing technical solutions to bring researchers to data rather than bringing data to researchers.”
Crucial to the personal health train approach are the FAIR data principles, i.e., research data should be Findable, Accessible, Interoperable, and Reusable for both humans and computers. FAIR data stewardship is a major theme in the entire D4LS programme. Data FAIRification promises to boost the reuse of clinical data. However, the practical implementation of the personal health train approach is still rather laborious. “We are currently running a pilot with a lung cancer data set of twenty data elements per patient. It takes us three months to FAIRify these data in a single hospital. Once this is done, retrieving the data for new patients is straightforward. We have now done this in a substantial number of hospitals spread over the world,” said Dekker. (Read more about Dekker’s work and FAIR in another article in the D4LS newsletter.)
Access to data and samples
While Dekker’ work package is still in the pilot phase, the results of D4LS work package 3 are already available to researchers at https://www.bbmri.nl/biobanks/bbmri-nl-catalogue/. This work package aims to make data and samples findable and accessible across the UMCs. Professor Morris Swertz (UMCG, leader of work package 3) took the audience along the road of creating a biobank catalogue with increasingly sophisticated search options. (Read the November 2016 interview with Swertz to find out more.) Swertz: “We have built a flexible software system called ‘MOLGENIS’. Among other things, it contains a FAIRifier toolbox to help restructure and recode data. BBMRI-ERIC has adopted MOLGENIS and the European biobank catalogue now consists of 1400 collections.”
Swertz elaborated on the hurdles that his team encountered so far: “We need manpower; this has been the greatest hurdle. Our hope is that D4LS will encourage employees of the UMCs to upload detailed descriptions of all UMC data and material collections in MOLGENIS to increase findability and reuse. In addition, uptake by the NFU could ensure that the data and system will be maintained beyond project funding. This is a very important step forward. Now, we want to deepen and broaden the catalogue with (meta)data from as many biobanks as possible. In addition, we want to improve the user interaction by adding visual elements and cleaner search options. We would love your feedback on this. We are also working to integrate the request workflow and improve on technical aspects.”
Technical services
The symposium also featured D4LS’s work on connecting technical services of the eight UMCs. At present, each individual UMC has a complex landscape of technical services to support its researchers. At the symposium, Hans van den Berg (AMC) described the efforts of work package 2 to harmonise IT architecture across the UMCs: “Researchers need IT services to capture, process, analyse, preserve, give access to, and reuse data. We have now defined the building blocks of an architecture that can support researchers in these steps of the research data cycle. Our next step will be to add further details. This architecture will form the basis for the implementation of the data sharing infrastructure built by the other D4LS work packages.”
Lessons learned
In a duo presentation with a wink and a smile, Dr Saskia Haitjema (UMCU) and Ing Marjolein Sijbers-Klaver (UMCU) shared their experiences with the local Research IT programme of UMCU. Its aim was to develop a platform for researchers to get easy access to clinical data, UMC-wide High Performance Computing (HPC), a bulk storage environment, and network facilities. Sijbers-Klaver was involved as an IT consultant: “When we completed the programme in 2015, we felt proud of the results.” Postdoc Haitjema was less charmed: “The results did not solve researchers’ basic problems. We needed a fast network, a direct connection between different servers, access from home, an option to bring our own devices, and much more. You did not ask us about our needs. We also missed a research-minded attitude at the IT department.” The researchers and IT specialists had to overcome hurdles, but eventually started to communicate again. Sijbers-Klaver: “For the IT specialists, it was an eye-opener that there are so many different types of researchers; they are customers with varying needs. So, there is no single IT solution to all problems.”
Digital research environment
Dr Arnoud van der Maas (Radboudumc) presented the efforts of D4LS work package 4, which aims to develop a digital research infrastructure for life sciences. “We started our work on the digital research environment (DRE) at Radboudumc in 2014. The first version of the DRE allowed researchers to access their data anytime, anywhere, to collect all their study data in one place and combine sources, to easily collaborate with colleagues around the world, to use the analysis tools and compute power they need, to comply to laws and regulations, and to store their data safely while deciding who has access,” said van der Maas.
“Now, we are transforming the current research environment at Radboudumc into the virtual workspace of the DRE. From the DRE, a researcher can choose and install software (app store), run open queries (MyQueries), load data and images (Data sets), store analyses and queries, and export & share analyses. The plan is to make the use of the DRE obligatory within Radboudumc in January 2018. In addition, the deans of the UMCs have unanimously decided that every UMC will participate in the DRE with one or two showcases to get acquainted with the functionality.
Future plans
Professor Jaap Verweij (Erasmus MC) concluded the symposium. “We have come a long way since the start of D4LS. As a next step, we will scale up the pilots and implement the D4LS results in the eight UMCs. Ultimately, the D4LS infrastructure should become an important component of Health-RI, the national infrastructure for personalised medicine & health research. Health-RI will offer a fully integrated end-to-end service in health research. Please register for the Health-RI conference on 8 December to find out more.”
All presenters have kindly given permission to make their slides available. These presentations can be found here on the Data4lifesciences website. You can read more about D4LS in the latest D4LS newsletter.