Chao Zhang, a bioinformatics PhD student at the Centre for Integrative Bioinformatics VU, wrote a blogpost about an ELIXIR Implementation study that looked into the options available to researchers for reusing data gathered from patients during clinical research. He outlines the several stages that are needed for these data to be accessed and reused.
Zhang: “I joined this collaborative project as part of my internship when I was still a master student. My supervisor, Sanne Abeln, enthused me and helped me to grasp the mind-blowing concepts of the research and get on with the work. The project lasted until the beginning of my second PhD year and was completed with the help of all the collaborators.
In this implementation study, we aimed to establish a data reuse scenario: firstly, users explore the integrated information in tranSMART and then trace them back to the big data in EGA. Afterwards, users can do the big data reanalysis in Galaxy. We achieved this by making the data flow from tranSMART to EGA and from EGA to Galaxy.
To do this, we first made available our test data in EGA and tranSMART with proper metadata; and then developed a Galaxy tool that enables importing EGA data into Galaxy effortlessly. Finally, by aligning the data models of tranSMART and EGA, we demonstrated that tranSMART and EGA are interoperable in a demo we especially developed.
Based on these efforts, we proposed a few essential well-defined metadata attributes for capturing human big data. Furthermore, we suggest these metadata attributes be uniquely named by long-lasting digital IDs to achieve the cross linking between different data resources like EGA and tranSMART.”