BioSB offers fundamental courses and advanced courses. The current list of courses we offer can be found below.
Fundamental courses: fundamental courses are organised once per year. In 5 days, a range of experts from the Dutch bioinformatics & systems biology community will provide you a solid foundation in core technologies in bioinformatics and systems biology, laced with examples of applications. The advanced courses build on this foundation.
Advanced courses: advanced courses are organised once every 2 years. Each course deals with a specific topic in current bioinformatics and systems biology. Each 5-day course is organized by experts and exposes students to both fundamental approaches and recent developments. Most build on methods introduced in the fundamental courses, though depending on prior education it may not be needed to follow these first.
Other courses: we occasionally also co-organise courses with other research schools or announce other courses via our website. These courses may have a different format and be offered occasionally only. Other courses are announced via the upcoming courses list.
It is also possible to pre-register for a course. You can do this by filling out this form and choosing the course you are interested in. We will send you an e-mail when the course of your interest is announced and when registration is online.
1.5 ECTS for following the course, 3 ECTS when successfully completing a final assignment
Modern biology is a data-rich science, driven by our ability to measure the detailed molecular characteristics of cells, organs, and individuals at many different levels. Interpretation of these large-scale biological data requires the detection of statistical dependencies and patterns in order to establish useful models of complex biological systems. Techniques from machine learning are key in this endeavour. Typical examples are the visualization of single-cell RNA-seq data using dimensionality reduction methods, base calling for nanopore sequencing data using hidden Markov models and (recurrent) neural networks, and classification of high-throughput microscopy image data using convolutional neural networks. In this one-week course, the foundations of machine learning will be laid out and commonly used methods for unsupervised (clustering, dimensionality reduction, visualization) and supervised (mainly classification) learning will be explained in detail. Methods will be illustrated using recent examples from the fields of systems biology and bioinformatics. Methods discussed in the morning lectures will be put into practice during the afternoon computer lab sessions.
Density estimation, including histograms, nearest neighbour, Parzen
Evaluation, including ROC, cross-validation
Parametric and non-parametric classifiers, including linear discriminant analysis, k-nearest neighbours, logistic regression, decision trees and random forests
Feature selection, including search algorithms (forward, backward, branch & bound) and sparse classifiers (ridge, lasso, elastic net)
Dimensionality reduction, including principal component analysis, multi-dimensional scaling, t-SNE.
Clustering, including hierarchical clustering, k-means, Gaussian mixture models
Hidden Markov models
(Deep) neural networks
Kernel-based methods, including support vector machines
After having followed this course, the student has a good understanding of a wide range of machine learning techniques and is able to recognize what method is most applicable to data analysis problems (s)he encounters in bioinformatics and systems biology applications.
The course is aimed at PhD students with a background in bioinformatics, systems biology, computer science or a related field, and life sciences. Participants from the private sector are also welcome. A working knowledge of basic statistics and linear algebra is assumed. Preparation material on statistics and linear algebra will be distributed before the course, to be studied by students missing the required background.
Natal van Riel, Eindhoven University of Technology
1,5 ECTS for following the course, 3 ECTS when successfully completing a final assignment.
Living organisms are characterized by an amazing degree of hierarchical complexity. Although our ability to collect measurements at different spatial levels and time-scales has grown dramatically, it has become clear that only measurements cannot provide the answer to unravelling biological complexity. This is because the dynamical behavior of complex systems cannot be reduced to the linear sum of the functions of their parts. Hence, computational modelling is an absolute requisite to gain understanding of the mechanisms underlying patterns observed in experimental data, in particular when studying dynamic phenomena. Mathematical models allow in a relatively cheap way to generate and test hypotheses about these mechanisms. However, given the huge complexity and peculiar features of biological systems, it is necessary to carefully understand the specific modelling requirements they pose, in order to define what a good model should look like. In this way one could say that modelling is a craftsmanship, that can only be learned via intense exercising and ‘learning by doing’. In this course we offer the participants the possibility to learn and exercise the modeling process.
In validating models one always meet with the need to fit models to data. So, the parameters that are present in any realistic model have to be chosen based on comparison of model predictions with data. In this matching process optimization techniques are indispensable. That’s why a considerable part of this course is spent on getting you acquainted with the optimization techniques that are nowadays available and widely used. Numerical optimization also is the basis for so-called flux balance analysis (FBA), commonly used to study large metabolic networks. This type of models and their analysis and simulation is also introduced in the course.
The course is a mixture of theory sessions and computer practicals. The course is completed with an assignment to be finalized afterwards, for those who want to acquire 3 ECTS.
The students will be provided with a theoretical basis, a variety of methods, and a computational hands-on experience to set-up systems biology models and handle numerical optimization.
In the course the students will learn:
To understand the common ground and the differences for applications of dynamic modeling in metabolic, regulatory, signaling, and multi-scale biological processes
How to set-up a dynamic model to represent biological networks using different interaction mechanisms
To implement, simulate and analyze dynamic network models
To understand the wide variety of problems in modelling that can be solved with optimization
To apply different types of numerical optimization methods
The combination of dynamic modeling and optimization to integrate experimental data in modelling, estimate model parameters and design experiments.
To understand how numerical optimization (linear programming) works in flux balance analysis to simulate metabolic network models.
Global and local search methods: steepest descent, Levenberg-Marquardt, genetic algorithms, linear programming.
The course is aimed at PhD students with a background in bioinformatics, systems biology, computer science or a related field, and life sciences. Participants from the private sector are also welcome. A working knowledge of mathematics, especially differential equations, is recommendable, but we will distribute preparation material to be studied by students missing the required background. Furthermore, at the start we offer a math refresher to help those participants who are not (yet) involved in modelling on a daily basis.
Examples and computer practical make use of Matlab. A computer with a working version of Matlab is needed and some programming experience and knowledge of Matlab are required to take the course. A short introductory training in Matlab will be made available (online) for those without Matlab skills.
prof. dr. ir. Dick de Ridder, Wageningen University
dr. Aalt-Jan van Dijk, Wageningen University
1.5 ECTS for following the course, 3 ECTS when successfully completing a final assignment
Molecular biology is concerned with the study of the presence of and interactions between molecules, at the cellular and sub-cellular level. In bioinformatics and systems biology, algorithms and tools are developed to model these interactions, with various goals: predicting yet unobserved interactions, assigning functions to yet unknown molecules through their relations with known molecules; predicting certain phenotypes such as diseases; or just to build up biological knowledge in a structured way.
Such interaction models are often best modelled as networks or graphs, which opens up the possibility of using a large number of readily available algorithms for inferring networks, performing simulations of biology, optimising paths or flows through networks, graph-based data integration and graph mining. Many of these algorithms can be applied (sometimes with slight alterations) to solve a particular biological problem, such as modeling transcriptional regulation or predicting protein interaction/complex formation, but also to derive systems behaviour by breaking down networks into modules or motifs with certain characteristics.
In this course, we will first give a brief overview of molecular biology, the advent of high-throughput measurement techniques and large databases containing biological knowledge, and the importance of networks to model all this. We will highlight a number of peculiar features of biological networks. Next, a number of basic network models (linear, Boolean, Bayesian) will be discussed, as well as methods of inferring these from observed measurement data. A number of alternative network models more suited for high-level simulation of cellular behaviour will also be introduced. Building on the network inference methods, a number of ways of integrating various data sources and databases to refine biological networks will be discussed, with specific attention to the use of sequence information to refine interaction and transcription regulation networks. Finally, we will give some examples of algorithms exploiting the networks found to learn about biology, specifically for inspecting protein interaction networks.
The course is aimed at PhD students with a background in bioinformatics, computer science or a related field; a working knowledge of basic statistics and linear algebra is assumed. The BioSB fundamental course “Machine Learning for Bioinformatics & Systems Biology” discusses many of the tools used in this course, but it is not required to have followed these. Prior knowledge of molecular biology is a bonus, but also not strictly required.
Comparative genomics aims to compare large sets of genomes in order to understand and explain differences in traits of an organism. Contemporary methods are powered by fundamental algorithms and data structures, which are efficient and scale to large data sets. A thorough understanding of these algorithms and data structures is necessary for advanced users and developers in this area. In addition, understanding how comparative genomics is developing is important to shape your own research.
In this course, we will cover genome analysis, variant analysis, and pangenomics. Core concepts, applications, and future trends will be discussed, with a focus on the algorithms and data structures underlying state-of-the-art methods. The course offers an engaging mix of lectures, paper discussions, hands-on tutorials, and a do-it-yourself project.
The course is aimed at PhD students with a background in bioinformatics, computer science or a related field. Participants are expected to have experience in command-line usage (Unix shell) and programming (Python), and have basic knowledge of genomics.
After having followed this course, the student has a good understanding of algorithms and data structures in comparative genomics, is able to implement algorithms in python, is able to read and understand method papers in bioinformatics in detail, and is able to work with state-of-the-art command-line tools for genomics.
VUmc/BioSB course: Statistics in Omics data analysis
Dr. R. X. de Menezes, Amsterdam UMC location VUmc
This introductory course gives an overview of many statistical tools to analyse omics data. The course can be followed by researchers with a minimum or elementary background in quantitative data analysis. Find more information about pre-requisites here.
Participants will learn and practice commonly used tools including:
Tools to explore datasets including clustering, principal components and network analysis
Models to answer basic statistical questions: differential behaviour (e.g. mRNA expression) and multiple testing, also using Bayesian models
Models for classification and prediction, including penalised regression
Models for emerging technologies: radiomics and single-cell sequencing data
Methods will be applied on experimental data in practical hands-on sessions using the statistical software R. Insight about how methods work is given in an intuitive way wherever possible which, combined with some formalisation and the practical work, makes theory accessible and helps cement concepts. Slides and instructions for the practical sessions will be made available electronically to participants.
The course is tailored for PhD students and researchers (such as pathologists, psychological biologists, human geneticists, oncologists, neuro-geneticists) whose research involves experiments that generate omics data. It can also suit researchers with a quantitative background looking for a short introductory course.
Constraint-based modelling: introduction and advanced topics
Ronan Fleming, Brett Oliver
Constraint-based modeling is a powerful modeling methodology that is being used to model a diverse range of biological phenomena. These include both fundamental and applied questions relevant to biotechnology, microbiology and medicine. Central to constraint-based modeling is the use of genome-scale reconstructions that represent particular cellular functions as a biochemical reaction network. In this course, you will be introduced to:
the principles of constraint-based reconstruction and analysis (COBRA)
the underlying mathematical foundations of constraint-based modeling
methods for integration of omics data with constraint-based models
basic and advanced methods for interrogating models and interpreting results
open source COBRA software, including the COBRA Toolbox
standards for reconstruction and model sharing
example applications to biomedicine and biotechnology
The course is structured into both lectures and practical sessions so that theory can be illustrated with biologically motivated computational examples.
Constraint-based modeling is a rapidly growing field that is being used in both fundamental and applied research and biotechnology. It sits at the intersection between quantitative modeling, bioinformatics and cellular physiology, and as such, is an example of systems biology at work.
While participants would benefit from some knowledge of either quantitative modeling, bioinformatics or metabolism this is not required. This includes, for example, scientists working in a laboratory and wanting to learn more about building or using genome-scale models, as well as those who have used some basic COBRA methodologies, but want to know how else it can be applied to their research.
Introduction to COBRA
Proficiency with the COBRA Toolbox
Familiarity with typical applications of constraint-based modeling in biomedicine and biotechnology
Managing and Integrating Life Science Information: Approaches using Linked Data and Semantics
Marco Roos (LUMC) and Katy Wolstencroft (Leiden University(
Credits and grading
The total study load of the course is 3 EC.
With endorsements for FAIR* data stewardship ranging from Nature Genetics to the G7, and increasing pressure from funders for much stricter data management, FAIR data stewardship skills will be among the most wanted for the next decade. By following this course, you add these skills to your CV and learn cutting-edge semantic techniques to search and integrate health and life science data for efficient, reproducible data science.
* FAIR: Findable, Accessible, Interoperable and Reusable for humans and computers
The amount of Life Science data available in the public domain is a vast and growing resource for bioinformatics research. There are over 20 million papers in PubMed and over 1600 biological databases. In many cases finding and applying the information from these resources is far from trivial. Following this course will show you techniques for working with these distributed resources, which includes using the web of Linked data and scientific workflows. It will also focus on methods for using or linking your own data into this large distributed Semantic Web of resources, in order to ensure that your data is FAIR (Findable, Accessible, Interoperable and Reusable).
This course is for bioinformaticians who would like to learn about leading-edge data and knowledge integration solutions. You will learn (1) powerful and flexible approaches to data and information management for your bioinformatics application (Semantic Web and Linked Data), (2) how to work with data across remote locations, for instance by applying Web Services and workflows, (3) how to publish your own data to make it available and reusable for the rest of the community. We assume a basic understanding of bioinformatics programming for the hands on sessions. It would suit previous user participants of BYOD meetings who would like more hands-on experience of data integration. It would also suit data providers who would like to explore new ways of serving their data or integrating it with other resources.
This course introduces modern techniques for the management of life science data and knowledge for bioinformatics applications. After following this course students should be able to start creating their first applications based on these technologies or make more informed design decisions for their current application.
In this course you will learn about:
Linked Data and the Semantic Web technologies that underpin it
How you can use Linked Data for data and knowledge integration in the Life Sciences
Available Linked Data resources in the public domain and large-scale projects that use these resources
How you can integrate your own data with Linked Data resources
How you can combine data integration and analysis over distributed resources, using Web Services and workflows
The 2018 edition of the RNA seq data analysis course was organised as a Summerschool together with VIB, ELIXIR Netherlands and ELIXIR Belgium.
The information below concerns the courses as organised before 2018.
Peter-Bram ‘t Hoen (LUMC), Leon Mei (LUMC), Jan Oosting (LUMC), Szymon Kielbasa (LUMC) and Celia van Gelder (BioSB)
The BioSB research school and partner LUMC are organizing a 3-day course on RNA-seq data analysis. This is an advanced course for people with experience in NGS. The course will consist of lectures, Galaxy and R practicals and will cover the analysis pipelines for differential transcript expression and variant calling. Examples will be taken from human and mouse studies. The course does not cover prokaryotic RNA profiling nor plant- and metagenomics aspects.
RNA-seq experimental approaches and study design
Quality control and alignment
Statistics for differential gene expression
eQTL analysis and allele specific expression
Single cell RNA sequencing
Fusion transcript detection
Small RNA profiling
Software for RNA-seq data analysis
Participants for the RNA-seq course should preferably have participated in the general NGS course or otherwise have demonstrated hands-on experience with NGS data analysis. The course is aimed at PhD students and post-docs, but scientific programmers and data analysts with a background in biology and bioinformatics may also attend.