We are looking for
We are looking for a Data Scientist that will help us discover the information hidden in vast amounts of data. We collect data on plant varieties and genetic resources utilizing various imaging (camera/drone/satellite) platforms and other (automated) data collection instruments, and increasingly produce whole genome sequences of varieties, wild accessions and breeding lines. We want to link these data to data from third party resources such as genebanks, genome annotations and gene ontology databases. Your primary focus will be in applying text and data mining techniques, linking data from distributed resources according to the FAIR data principles, doing statistical analysis, and building high quality prediction systems to support breeding decisions. This ambition is illustrated in the Farm Data Train Concept.
You will work in close cooperation with PhD students and researchers involved in national and international projects including the Horizon 2020 projects Traditom and G2P-SOL, and ELIXIR Accelerate. You will be actively involved in acquisition of new research projects in this rapidly developing field, in collaboration with the plant breeding industry. You will be part of an emerging research group on “Big Data analytics” within Plant Breeding and, as such, work with us towards improving food quality and food security.
We are looking for an ambitious scientist with a MSc and several years of working experience in this research area or a PhD with a degree in Computer science or a comparable background, preferably with affinity with plants. The successful candidate is highly motivated to develop a scientific career, is able to perform independent research, and values collaboration with PhD students and other researchers. The candidate is fluent in English and has excellent communication, scientific writing and presentation skills.
The candidate has several of the following expertises:
- Data-oriented personality
- Excellent understanding of machine/deep learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
- Scripting and programming skills
- Experience with data visualisation tools, such as D3.js, GGplot, etc.
- Practical experience building applications using standards and technologies falling within the semantic web stack (e.g., Jena, Sesame, Triple / Quad Stores).
- Experience applying Semantic Web and Linked Data standards (RDF, SPARQL, and OWL) to real-world applications according to the FAIR data principles
- Proficiency in using query languages such as SQL and SPARQL
- Good applied statistics skills, such as distributions, statistical testing, regression, etc.
Plant Breeding is a merger of the Laboratory of Plant Breeding of Wageningen University and the Business Unit Biodiversity and Breeding of Stichting Wageningen Research. We have 90 employees and about 110 students and guest scientists. We conduct fundamental, strategic and applied research on breeding and genetics of various plant and mushroom species and are involved in academic education (BSc, MSc, PhD) in Plant Sciences. Research in Plant Breeding concentrates on the genetic and genomic basis of relevant traits in various crops, including durable disease resistances. Big Data analytics and decision support systems play an increasingly important role within our research.