Website TUEindhoven **TU Eindhoven, Dept. of Mathematics & Computer Science**

Eindhoven University of Technology (TU/e) is a research university specializing in engineering science & technology.

## MSc project

## Reconstruction of inheritance trees in families of plants

Plant breeding is the science of changing the traits of plants in order to produce desired characteristics (Wikipedia): Breeding new crops is important for ensuring food security by developing new varieties that are higher yielding, disease resistant, drought tolerant or regionally adapted to different environments and growing conditions.

Research firms in the Netherlands are at the forefront of plant genetics. This project’s formulation is inspired by a scientific discussion with members of such firm.

Consider a diploid species of \(N\) plants each of which has \(2m\) chromosomes. Each chromosome consist of \(n\) nucleotides, say. We will label all of the chromosomes of each individual \(a \in [N]\) by a set of vectors in a family \(\mathcal{X}(a)\). When two plants \(a,b\) are crossed, their offspring \(c\) inherits a new diploid set of chromosomes \(\mathcal{X}(c)\). This new set of chromosomes is constructed according to a random process, and depends strongly on the chromosomes of both parents. This random process can be described mathematically using a combination of Markov chains, linear algebra, and graph theory.

Suppose now that we observe a family of chromosome sets \(\mathcal{X}(1), …, \mathcal{X}(N)\) from experimental data. A graph-theoretical description of which plant inherited genes from which other plant(s) can be achieved through so-called forests. Let \(\mathcal{F}_N\) be the set of all labeled directed forests of size \(N\), and recall that a directed forest is a directed graph of whose all connected components are directed trees. Under the inheritence dynamics and structure referred to above, there is now a particular probability that a family of chromosome sets was constructed given a forest as the inheritance structure. The aim of this project is to find the most likely forest given the experimental data: viz., find \(F^* = \arg \max_{ F \in \mathcal{F}_N } \mathbb{P}[ \cap_{a=1}^N \mathcal{X}(a) | F ]\).

To achieve this goal, the student must model the inheritance dynamics and structure in order to calculate the probability on the right-hand side. Subsequently, she could attempt to do a log-likelihood maximization. While the latter can likely be done exhaustively using a computer, we expect the computation to be hard. The student will therefore be challenged to leverage the generally nice structure of trees, and to come up with a smarter, more efficient, optimal algorithm; or alternatively, a useful heuristic.