Centrum Wiskunde & Informatica
Science Park 123
Solon P PIssis
020 592 4263
Life Sciences research at CWI develops algorithms, theory, models, and simulations, and is an expert in retrieving and linking large volumes of complex data. Current technologies include[1. GENOME DATA SCIENCE]: Next-generation sequence analysis (including cancer genomics and single-cell sequencing protocols), advanced high-throughput workflow design (Snakemake & Bioconda), computational pangenomics / human genetics studies data structures and method supply; [2. MEDICAL INFORMATICS]: Clinical decision support systems based on (multi-objective) optimization, machine learning, and image processing; [3. MULTISCALE MODELING]: Multi-scale cell-based modeling including cellular Potts model, mechanobiological modeling, systems modeling;
Biological applications include
1.) Construction of large-scale pipelines and method development for (NGS) genetic variant calling and imputation, reconstruction of viral quasispecies, cancer genomics and single-cell sequencing experiments;
2.) Multi-objective optimization and radiobiological modelling for accurate inverse (i.e., automatic) treatment planning in radiotherapy, machine learning for 3D dose reconstruction of historically treated patients in radiotherapy, and novel approaches to deformable image registration to account for large anatomical differences.
3.) Modeling of angiogenesis, mechanobiology, tumor progression, plant development, gut microbial ecology and metabolism
- Biodiversity & ecology
- Biomedical & health
- Agri & Food
- Biomedical Data Science
- Genome Data Analysis, Management and Integration
- Multi-scale Modeling (cell-based modeling, differential-equations)
- Clinical Decision Support Systems
- Image Processing
- Human Genetics
- Microbial Communities
- Plant and Animal Research
Expertise and Track Record
CWI covers a wide range of topics in mathematics, algorithmics, machine learning simulation and modeling, and in data and computational science. It is unique in the Netherlands in its expertise on fundamental techniques from mathematics and computer science and in the translation of this fundamental research to usable workflows, tools, databases, and pipelines.
CWI’s team has a unique expertise in method development for next-generation sequencing based experiments. They offer tools for discovering genetic variants in human genomes, somatic variants, and reconstructing virus genomes. The variant discovery tools made major contributions to the Genome of the Netherlands (GoNL) project and are now currently in use in the international ALS (amyotrophic lateral sclerosis) project. CWI also offers unique data integration and workflow management services. For example, CWI hosts the author of the Snakemake workflow management system.
CWI is also active in medical applications, ranging from optimization-based decision support systems to medical image processing as well as other applications. Clinical relevance is always considered via direct collaborations with clinicians and/or medical industry to ensure the best chances of obtaining translational research results. CWI actively works on machine learning for highly-individualized historical dose reconstruction, automated treatment planning in brachytherapy (i.e., internal radiotherapy), emotion recognition for advanced human-robot interaction for childhood cancer patients, and multi-objective optimization for deformable image registration with content mismatch and large deformations in medical images.
The Biomodeling and Biosystems Analysis team has a unique expertise in multiscale modeling of biological systems; they have developed a set of open-source computational methods, including methods for vertex-based modelling of the mechanics and transport properties of growing plant and animal tissues, and hybrid finite-element and cellular Potts methods to study the role of cell-extracellular matrix interactions in animal development and tissue engineering.
1.) GENOME DATA SCIENCE.
For evidence, please see:
– The “main paper” of the Genome of the Netherlands (citation  below). The contribution of Dr. Schönhuth’s team was to discover notoriously hard-to-detect genetics (“twilight zone insertions and deletions”) variants by developing novel data mining algorithms, the corresponding paper of which (CLEVER; Bioinformatics, 2012) received an award by the Faculty of the 1000.
– The “structural variation paper” of GoNL (citation  below); Dr. Schönhuth is a corresponding author of this paper. Among other achievements, this paper outlines how to impute structural variants into cohorts of individuals at so far unprecedented accuracy.
Dr. Schönhuth is a founding member of this consortium, which deals with the current status, promises and challenges of the Data Science arising from analyzing, arranging and integrating hundreds of thousands of genomes. He is the last author of the corresponding white paper:
“Computational pan-genomics: status, promises and challenges”
The Computational Pangenomics Consortium.
Briefings in Bioinformatics, bbw089, 2016
2.) MEDICAL INFORMATICS.
Coordinator and lead here is Dr. Bosman. A dedicated project page can be found here.
CWI is project leader in the project “Improving Childhood Cancer Care when Parents Cannot be There – Reducing Medical Traumatic Stress in Childhood Cancer Patients by Bonding with a Robot Companion”. CWI contributes by novel approaches to emotion detection as key input to new technology in AI and robotics that governs interaction between a humanoid robot (NAO) and childhood cancer patients that is aimed at creating a buddy bond to reduce stress. CWI collaborates here with academic partners TU Delft, AMC, PMC, and industry partners ASolutions, Cancer Health Coach, Focal Meditech and Brocacef.
CWI is also project leader in the project “ICT-based Innovations in the Battle against Cancer – Next-Generation Patient-Tailored Brachytherapy Cancer Treatment Planning”. CWI contributes new algorithms in multi-objective optimization, machine learning, and parallellization to achieve novel approaches to automatically computing high-quality brachytherapy treatment plans. CWI collaborates here with academic partner AMC and industry partner Elekta.
CWI is project member of the project “3D dose reconstruction for children with long-term follow-up – Toward improved decision making in radiation treatment for children with cancer”. CWI contributes new algorithms in machine learning to find, based on historically available data, the best matching patient in a database of recently treated patients so as to enable highly-individualized dose reconstructions of historic external beam radiotherapy treatments. CWI collaborates here with academic partner AMC.
CWI is research team member in Dutch Cancer Society research project grant (KWF grant No. NKI 2012-5716): “Optimized targeting for surgery and radiotherapy of breast cancer with a DCIS component”. CWI contributes multi-objective optimization algorithms for novel insights into automated parameter tuning of deformable image registration software and obtaining so-called class solutions, applied to images of breast cancer patients. CWI collaborates here with academic partners AMC and NKI.
3.) MULTISCALE MODELING.
Cell-based modeling of animal tissues: In this collaboration with the Hubrecht Laboratory in Utrecht we contributed cellular Potts modelling. Tools: Tissue Simulation Toolkit, http://sourceforge.net/projects/tst, and contributions to CompuCell3D, http://www.compucell3d.org.
Cell-based modeling of plant tissues: We have developed a modeling tool called VirtualLeaf, see http://virtualleaf.googlecode.com and Merks et al. Plant Physiol. 2011. The tool has been applied in recent projects at WUR (Adibi et al. Science 2014) and at the University of Antwerp (De Vos et al. PLoS Comp. Biol. 2014). It is currently applied in a collaboration with Leiden University (one PhD student) and the tool is further developed into a vertex-based method for plant development in collaboration with the University of Pittsburgh.
Dr. Schönhuth’s team has unique expertise in analyzing, integrating and managing genome data, in particular next- and third-generation sequencing data. He has developed related methods and applied them in population-scale sequencing projects. Examples are the Genome of the Netherlands project and currently the international project on amyotrophic lateral sclerosis (ALS) (processing several thousands of sequenced genomes). In his team, also workflow management systems (such as Snakemake and Bioconda) are developed. See References 1-3 below.
Dr. Bosman’s team performs research aimed at translation into real-world clinical practice and thereby almost unanimously participates in multidisciplinary, cross-technology projects with methods and techniques from mathematics and computer science. Specifically, data acquired with medical (imaging) devices and procedures is analyzed and used for optimization- and machine-learning based decision support. For instance, patient records and x-rays are analyzed to find high-quality matches between pairs of patients. As another example, annotated (e.g., segmented) 3D medical images that include catheters inserted into the human body are used as a basis to compute high-quality brachytherapy treatment plans.
Dr. Pissis’s team works on designing, implementing, and engineering algorithms and data structures on sequences for pattern matching, indexing, comparison, and finding regularities. These algorithms and data structures are the workhorse of many computational biology applications seeking to analyze biological sequences such as DNA, RNA or peptide sequences with the ultimate aim of understanding their features, function, structure, or evolution.
- The main paper of the Genome of the Netherlands Consortium. Dr. Schönhuth's team has developed methods to discover hard-to-detect genetic variants ("twilight zone indels"): "Whole-genome Sequence Variation, Population Structure and Demographic History of the Netherlands". Genome of the Netherlands Consortium. Nature Genetics, 46(8), 818-825, 2014.
- The structural variants paper of the GoNL consortium. Dr. Schönhuth's team has enabled imputation of several classes of structural variants into patient cohorts in the frame of genome-wide association studies (GWAS): "A high-quality human reference panel reveals the complexity and distribution of genomic structural variants", Hehir-Kwa et al. (Schönhuth is co-corresponding author), Nature Communications 7:12989, 2016.
- "Estimating the pace of evolution". Dr. Schönhuth's team contributed by discovering hard-to-detect de novo variants, which substantially refines estimates about the pace of evolution: "Characteristics of de novo structural changes in the human genome", Kloosterman et al., Genome Research 25(6), 792-801, 2015.
- A novel approach for reconstruction of viral quasispecies from Dr. Schönhuth's team. This is already in use at UMCU/UU and the Helmholtz Center for Infection Research, Braunschweig: "De novo assembly of viral quasispecies using overlap graphs", J. Baaijens, A.Z. El Aabidine, E. Rivals and A. Schönhuth, bioRxiv:080341 (Genome Research, accepted for publication)
- Machine learning in brachytherapy treatment planning: S.C. Maree, P.A.N. Bosman, Y. Niatsetski, C. Koedooder, N. van Wieringen, A. Bel, B.R. Pieters, T. Alderliesten. Improved class solutions for prostate brachytherapy planning via evolutionary machine learning. In Proceedings of the European SocieTy for Radiotherapy & Oncology conference - ESTRO-2017, 2017.
- Multi-objective optimization for deformable image registration: P.A. Bouter, T. Alderliesten, and P.A.N. Bosman. A novel model-based evolutionary algorithm for multi-objective deformable image registration with content mismatch and large deformations: benchmarking efficiency and quality. In Proceedings of the SPIE Medical Imaging Conference 2017.
- Machine learning in patient matching for dose reconstruction: M. Virgolin, I.W.E.M. van Dijk, J. Wiersma, C.M. Ronckers, C. Witteveen, C.R.N. Rasch, A. Bel, T. Alderliesten and P.A.N. Bosman. Learning to Associate Distances with Historical Patient Data to Enable Fine-grained Studying of Late Adverse Effects of Paediatric Radiotherapy: Data, Methodology, and First Results. In U. Oelfke and M. Partridge, editors, Proceedings of the International Conference on the use of Computers in Radiation Therapy - ICCR-2016, 2016.
- Alice Heliou, Solon P. Pissis, and Simon J. Puglisi. emMAW: computing minimal absent words in external memory. Bioinformatics, 33(17):2746–2749, 2017. In this paper, we developed the first efficient algorithm that works in external memory for computing the absent words of a given genome. A standard implementation requires more than 20n bytes of RAM for a genome of length n. Such memory requirements are a significant hurdle for big genomes. This algorithm allows for computation of absent words on far bigger datasets than was previously possible. It has been applied on a corpus of 15 thousand genomes by SafeTraces (http://safetraces.com/), a company located in the United States, in collaboration with BioCos (http://www.biocos.gr/), a company located in Greece, as a non-repudiable authentication method for food tracing.
- Lorraine A. K. Ayad, Solon P. Pissis, Dimitris Polychronopoulos: CNEFinder: finding conserved non-coding elements in genomes. Bioinformatics 34(17): i743-i747 (2018). Conserved non-coding elements (CNEs) represent an enigmatic class of genomic elements which, despite being extremely conserved across evolution, do not encode for proteins. Their functions are still largely unknown. Thus, there exists a need to systematically investigate their roles in genomes. Towards this direction, identifying sets of CNEs in a wide range of organisms is an important first step. Currently, there are no tools published in the literature for systematically identifying CNEs in genomes. We fill this gap by presenting CNEFinder; a tool for identifying CNEs between two given DNA sequences with user-defined criteria. The results presented here show the tool’s ability of identifying CNEs accurately and efficiently. CNEFinder is based on a k-mer technique for computing maximal exact matches. The tool thus does not require or compute whole-genome alignments or indexes, such as the suffix array or the Burrows Wheeler Transform (BWT), which makes it flexible to use on a wide scale.
- Lorraine A. K. Ayad and Solon P. Pissis: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics: 18(86) (2017). A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program. We present MARS, a new heuristic method for improving Multiple circular sequence Alignment using Refined Sequences. MARS was implemented in the C++ programming language as a program to compute the rotations (cyclic shifts) required to best align a set of input sequences. This tool has been used by many research groups working on comparative genomics.
– Genome of The Netherlands project. Dr. Schönhuth is a member of the project consortium.
– ALS Centrum Nederland. Dr. Schönhuth is part of the research team.
– The Computational Pangenomics Consortium. Dr. Schönhuth is a founding member.
– EU FP7 Collaborative Support Action RETHINK Big (grant agreement no 619788, http://http://rethinkbig-project.eu). Project aim: Roadmap for European Technologies in Hardware and Networking for Big Data. Role: Working group leader “Life Sciences”.
- Cluster computing, GPU computing;
European Genome-Phenome Archive (EGA); DANS;