Personal Health Train

The Personal Health Train (PHT) aims to connect distributed health data and create value by increasing the use of existing health data for citizens, healthcare, and scientific research.

The key concept in the PHT is to bring algorithms to the data where they happen to be, rather than bringing all data to a central place. The PHT is designed to give controlled access to heterogeneous data sources, while ensuring privacy protection and maximum engagement of individual patients and citizens. As a prerequisite, health data is made FAIR (Findable, Accessible, Interoperable and Reusable). Stations containing FAIR data may be controlled by individuals, (general) physicians, biobanks, hospitals and public or private data repositories.

Other relevant links:
FAIR data
GO FAIR initiative
FAIR data in health research

‘Stations’ are locations containing FAIR data. Stations can range from very large core databases to small Personal Lockers holding an individual’s own data. A station has house rules: each data element has an associated license describing what a visiting train is allowed to do with that data element. This can range from ‘nobody is allowed to do anything’ to ‘anyone can use this element as they wish’. Stations are built and maintained by trusted parties and governed by the data owner.

‘Trains’ are ‘workflows’ executing specific research questions or use cases that need FAIR data. Trains can be big: ‘Which data elements are most predictive of survival after lung cancer given all data in the Netherlands?’ or small and personal: ‘Is there a trial that I can join given my specific disease?’ Trains are built and maintained by the researcher.

The ‘track’ sets, maintains, and checks the rules of interaction and is the interface between trains and stations. Only certified research trains and certified FAIR data stations are allowed on the track and their use is fully controlled and auditable. Tracks are built and maintained by health informatics experts.

The PHT is a public-private enterprise. We are soliciting partners to jump on the train and make the PHT a reality through collaboration and by jointly acquiring funding.

We distinguish two kinds of partners:

  • Formal consortium partners have committed to:
    • collaborate towards the full realisation of the Personal Health Train infrastructure
    • adhere to the FAIR Data principles
    • adhere to the rules of engagement
    • contribute either in kind or in cash to the Personal Health Train project
  • Associated partners have committed to:
    • collaborate with the consortium on specified subprojects towards the common goal of a Personal Health Train or a complementary federated data infrastructure for personal health research
    • adhere to the FAIR Data principles
    • keep the PHT coordinators informed about other initiatives within the PHT scope

Hop aboard!

To become a member of the community you can sign up here.

We are looking for partners to help us build the Personal Health Train. To trigger your imagination, we will list some examples below:

Support hospitals to create FAIR data stations
You are an SME specialised in data integration. Together with a PHT academic partner, you develop software that extracts clinical and imaging data from electronic health records (EHR) and picture archiving and communication systems (PACS), and publishes features from these data on an internal semantic web page, using a formal ontology, thereby creating a FAIR data station.

Build a train to fill a personal health record
You are partnering with a patient organisation to develop a personal health record (PHR) application for a disease. You build a PHT train that citizens can authorise to visit data stations, load the citizen’s data, and store it in the citizen’s PHR.

Develop a serious game for dynamic informed consent
You are a health information ethics, privacy, and legal expert who studies novel ways for patients to give informed consent before sharing data. Together with an IT startup company, you develop a patient-directed serious game application to ensure patients understand the risks and benefits of sharing a certain piece of information (e.g., their genome) before giving consent to share data in the PHT.

Create new ways for distributed computing
You are researcher whose research line involves image mining and correlating image features with disease outcome. These analyses require intensive computing resources. Together with a cloud provider, you develop a hybrid, non-persistent, private cloud solution, which allows you to learn from images stored in hospital PACSs without the need for imaging data to move and/or each hospital you work with to buy a computing cluster.

Personal locker for genetic data
The information present in an individual’s DNA will increasingly be used for medical treatment, life style advice, and other personal aims. Individuals that have been DNA typed, for example in the context of large cohort studies or genetic counseling, have little knowledge about the available DNA data. They do not receive a copy, can rarely access the data, and have little, if any control over its use. This is surprising since such information can already be used effectively, for example to adjust and optimise drug prescriptions. A personal DNA locker is a safe environment and infrastructure where individuals can access a copy of all their personal genetic data. The locker helps them understand the implications for their health status and medical use, allows them to add personal phenotypic information and consent (or prevent) their sharing with others, including treating physicians and/or to support (specific) research projects. The personal locker will be FAIR and can thus be accessed by the Personal Health Train so that individuals can engage in biomedical research.

The DNA Bank: High-Security Bank Accounts to Protect and Share Your Genetic Identity, Johan T. den Dunnen* DOI: 10.1002/humu.22810

Project: Research at Home

Alzheimer research is not just focusing on prevention and cure, but also on patients living with Alzheimer. As already 85% of the 250.000 Dutch Alzheimer patients live at home, a new research infrastructure is needed.
This infrastructure needs to connect a combination of digital appliances, care professionals, automated diagnostics and smart domotica. From analysing voices of patients and digital care logs to smartwatches with connected health apps.
The Personal Health Train could support this research offering a new type of research infrastructure. Personal lockers of data of Alzheimer patients can be created and “Trains” would fill these personal stations with information from the patient themselves, from general practitioners and care organizations. Researchers could send application to the stations to learn e.g. which domotica is most effective in which patients. Or to find which behaviour pattern changes are reliable indicators for the progression of Alzheimer. It is expected that “Research at Home” could not only enable new research but also make current research much less expensive.

The COMMIT Public Private ICT Research Community has issued a call for Swallow projects. These projects are intended to enable technology transfer from the current, almost finished COMMIT/Programme to the new public-private COMMIT2DATA programme (ICT-Topsector programming 2016-2019).

The Personal Health Train team co-authored the Privacy Respecting ANAlysis of Distributed patient health DATA (PRANA-DATA) project. This project with TNO (lead), Radboud UniversityMaastricht UMC+UMC Groningen, Portavita, Synergetics, and DTL has been approved on January 21, 2016. The project will be led by Prof. Wessel Kraaij of TNO/Radboud University.

The Health Research Infrastructure initiative (Health-RI) aims to establish an interconnected data infrastructure for Dutch personalised medicine and health research. Health-RI is a joint initiative of DTL, ELIXIR-NL, BBMRI-NL, EATRIS-NL, NFU, and Health~Holland, supported by a large group of stakeholders in the health domain.
More info:

STW Perspectief Programme
Radiomics is an image mining approach where vast amounts of features are extracted from medical images and correlated with biology and clinical outcome features (animation on:

A STW Perspectief programme has been awarded to a consortium of public and private partners to develop the Radiomics approach further. The Personal Health Train team will lead the STW “Distributed Radiomics” project where a distributed FAIR infrastructure is set-up at four Dutch UMCs (MaastrichtRotterdamGroningenNijmegen) so that medical images can be mined without the need for these images to leave the hospital.

euroCAT and its sibling projects duCAT, chinaCAT, VATE, ozCAT (www.eurocat.info : “Distributed learning to enable personalized medicine”.

The goals of the CAT network are:
a) to develop a distributed data infrastructure of cancer patients, tumours, images and treatments
b) learn outcome prediction models in a distributed manner.

The euroCAT infrastructure original spanned 5 radiotherapy centres in the Netherlands, Belgium and Germany and established FAIR data stations in hospitals and trains “avant la lettre”. The current network consists of 18 centres from the Netherlands, Belgium, Germany, UK, Italy, USA, Australia, India and China.

The CAT network is funded from a variety of global bodies including STW (NL), The Ministry of Health (NSW, Australia), Varian Medical Systems Inc (USA) and Interreg (EU).

KNAW-KRING: “Klinische data beschikbaar maken voor Researchers In een Nederlandse Grootschalige onderzoeksfaciliteit”. The Royal Netherlands Academy of Arts and Sciences has initiated a call for the best ideas for facilities that should be available in the Netherlands in 2025. The KRING project was selected for the final round and takes a similar approach the The Personal Health Train.

NWO-Personal Health Train: MUMC+, LUMC, UMCG and VUA came together to synthesize the Personal Health Train for the first time based on several disparate initiatives from these partners. Submitted to NWO-“Investeringen Groot”, it failed to meet the criteria for funding but started the journey of the Personal Health.

Does the project also provide any kind of data analysis?
The PHT is primarily a FAIR data infrastructure approach so that a research question (including a specific type of data analysis) can be posed to the data without the need for data to leave the station (hospital, PHR, cohort, etc.). As analyses vary a lot between studies, it is up to the user to specify and implement the analysis. Nonetheless, there are open source analyses available to perform certain tasks (e.g., learn a support vector machine classifier in a PHT manner) and we expect that community to grow.

Is data validated or verified somehow or is trust left to the bona fide of the participants? On the website it says ‘Only certified research trains and certified FAIR data stations are allowed on the track and their use is fully controlled and auditable.’ Who is in charge of the auditing?
Data quality (including validation and verification) is very important to prevent garbage-in, garbage-out studies. However, what constitutes good and bad quality data is very user dependent and it is impossible to police that. We therefore focus on data provenance and data quality descriptors. Having properly described (FAIR) data and knowing its provenance solves a lot of data quality issues. For instance, a prospective cohort, with a data collection protocol and very little missing data from a reputable source vs. the routine care data from a hospital. Both can be used in the PHT, but some studies might need the high quality data of the prospective cohort, while others may be more interested in routine data.

When you mention it already works in 8 hospitals, is it fully implemented? 
In the CAT network (global cancer hospital based), the PHT principles were used to learn predictive models for lung cancer. This is published work (

Do you have a timeline/full implementation timeline (when the project is considered to have enough information to make the impact it wants to achieve)?
After the contractual phase, it takes about three months for a hospital to join the network and give access to say 20 data elements for a specific patient population. Once joined, providing additional elements and patients typically takes hours to days.

You mention that the business model works with parties sharing their data having free access and those who want to access but don’t share will have to pay – do you have more info about the business model? Who pays for the infrastructure and set up?
In the CAT project (global cancer, hospital based) we are coalescing around two rules of engagement

  1. All publications resulting from research which uses the providers’ data, include at least one co-author from the data provider (academic reward)
  2. All revenue (e.g., royalties) resulting from research which uses the providers’ data, is split between the researcher (50%) and the data providers (50% pro rata of the number of data subjects included in the research)

Besides this approach, one can think of other business models as long as they focus on rewarding data providers for making their data FAIR and share them. The current focus is on reducing cost, effort, and throughput time for a center to join the PHT. Amongst others, this means providing high quality, open source FAIRifying tools.

In the introductory video, it says the project is open for research, education, patients, providers, … you think we could find a place for pharma industry with broad VBHC purposes (research, hc delivery, outcomes based contracts, etc)?
Yes, definitely. Industry (pharma, devices, insurance) is one of the healthcare segments that we hope will use the PHT. Note that data access policy (the A of FAIR) may limit some use cases. For instance, in the Netherlands, one can only use data for research and not for other purposes (unless consent is given).