Reuse of clinical data for research: overcoming the hurdles

The systems of the eight Dutch UMCs contain a wealth of patient data, such as medical images, physical examination results, and treatment outcomes. “These data have been collected to diagnose and treat patients, but we can reuse them to answer important research questions. However, reusing clinical data is not as simple as you may think. In addition to concerns about patient privacy, consent, and data de-identification, there are practical hurdles,” says Professor André Dekker.

This interview was originally published in the October 2017 Data4lifesciences Newsletter.

Dekker is a professor in Clinical Data Science at Maastricht University. He leads the Data4lifesciences work package ‘Reuse of clinical data for research’, which aims to make all clinical data from the UMCs available to researchers across the country, while fully protecting patient privacy and without impacting healthcare. Dekker explains: “Large amounts of clinical data are collected for healthcare purposes, but there is little secondary use of these data for life sciences research. This is because there are several practical problems.”

Dekker gives an example from his own research field, oncology: “We would like to know which factors can predict the effectiveness of various lung cancer treatments. In the end, we would like to make statements such as ‘Treatment A is most effective in females above 70 years old with a small tumour without metastases, while treatment B is preferred for young males with a large metastasised tumour’. To make this possible, researchers combine large amounts of data from multiple sources and use artificial intelligence to identify the relevant factors. We need large numbers of patients to make reliable predictions. Most hospitals have a few hundred lung cancer patients at most, so it is essential to combine the data from multiple hospitals. And that is where the problems start.”

Different systems
Different hospitals often use different types of personal health records, image archives, and laboratory software. Dekker: “And even if they use the same system, people often use different terminologies or they collect different types of information. One hospital may have asked its patients how many packs of cigarettes they smoke per week, and another may have recorded the number of cigarettes per day. It takes a tremendous amount of time to harmonise datasets and this is often repeated for each new research question. In addition, there are concerns about patient privacy when the data leave the hospital. In this Data4lifesciences project, we develop techniques to harmonise datasets and to transfer data in a secure way. We use the FAIR principles for the first and the FHIR standard for the latter. Our team is currently running a small pilot study with data from head and neck cancer patients from Maastricht (MUMC+/MAASTRO Clinic) and, in the future, Leiden (LUMC).”

Findable and accessible
The first step in the pilot consists of making the data FAIR. The acronym FAIR stands for Findable, Accessible, Interoperable and Reusable. The FAIR principles were drafted with strong Dutch involvement and are now adopted worldwide. Both the European Commission and the G20 encourage researchers to embrace the FAIR principles. “To explain the F and A of FAIR, I often draw a comparison with the worldwide web,” says Dekker. “People from all over the world can generate websites. You can find and access these websites using a browser. So, the internet makes data Findable and Accessible. Similarly, hospitals can load their patient data to secure internet sites, making the data findable and accessible to researchers at other hospitals.”

Interoperable and reusable
Dekker continues: “But the information on a website can have any format. For instance, the language may be Italian or Chinese, or people may use synonyms. This also applies to clinical data sets. For cancer of the larynx, doctors in Leiden may use ‘larynxcarcinoom’, whereas doctors in Maastricht may use ‘strottenhoofdkanker’. To make sure that the data relate to the same disease, we can use a medical ontology such as the ‘international classification of diseases’ (ICD). In the upcoming ICD version 11, cancer of the larynx has received the code ‘’. If we use this code when uploading data from patients with ‘larynxcarcinoom’ in Leiden and  ‘strottenhoofdkanker’ in Maastricht, suddenly a non-Dutch speaking scientist and even a computer can understand that the patients on the secure internet site have larynx cancer. This is a crucial step to achieve data interoperability and reusability (The I and R of FAIR).”

The team has successfully made the pilot data FAIR. Dekker: “Our attention has now shifted to transferring the data in a secure way using the FHIR standard.” FHIR stands for ‘Fast Healthcare Interoperability Resources’. The FHIR standard has been specifically designed to exchange electronic health records. FHIR should make it easy to provide healthcare information to healthcare providers and patients on devices from computers to tablets and smart phones. Transferring data with the FHIR standard is secure because the messages are encrypted. Instead of transferring an entire document, the FHIR approach transfers discrete data elements. “This is an existing technology that works well. We are now deciding on the exact format of the message. In 2017, we will transfer FHIR messages containing FAIR data from Maastricht to an external party as a test.”

Dekker and his team collaborate closely with the NFU programme ‘Registration at the source’, which aims to make the registration of patient data more simple, to improve the data collection at the source. “Sometimes, data simply cannot be made FAIR because the electronic health record contains insufficient information. ‘Registration at the source’ should change this. We also collaborate intensively with the Data4lifesciences project ‘Biomedical Data Sharing & Analysis in Clinical Studies, which focuses more on using the data as opposed to getting the data. Our joint ultimate goal is a national accessible research infrastructure for sharing and analysing life sciences data, with tight links to healthcare systems and research tools. I think we have come a long way in overcoming the hurdles and that our work will result in better, more personalised, healthcare in the future.”

About Data4lifesciences and DTL
The NFU programme Data4lifesciences (D4LS) was launched in 2014 to realise an integrated research data infrastructure for biomedical research at the Dutch UMCs. DTL is actively involved in D4LS and André Dekker is an active member of the DTL community.

Comments are closed.