Pascal Lagerweij - Reinforcement Learning on Block Markov Chains

Reinforcement Learning in Block Markov Chains

On March 29th, 2019, Pascal Lagerweij defended his MSc thesis Reinforcement Learning in Block Markov Chains. His committee consisted of Piet Van Mieghem, dr. Matthijs Spaan, and myself (as daily supervisor). Pascal joined our research group Network Architectures and Services on May 22nd, 2018. His work builds further upon our work on Clustering in Block Markov Chains available here and here.

Reinforcement Learning in Block Markov Chains


Nowadays, reinforcement learning algorithms on Markov decision processes (MDPs) face computational issues when the state space is large. To reduce this state space of a MDP several state aggregation, or clustering, methodologies have been applied. Recently, a new clustering algorithm has been proposed that is able to cluster states from a single block Markov chain. A block Markov chain is a Markov chain with blocks in its transition matrix that correspond to clusters. Our aim was to investigate the possible combination of state aggregation in reinforcement learning on MDPs with clustering of states on a block Markov chain. First, we investigated the clustering algorithm and its properties to see its performance with different parameters and trajectory length. We compared the statistical properties of a pure Markov chain and the mixed Markov chain generated by a MDP. Afterwards, we verified the performance of the clustering algorithm on this mixed Markov chain. We proposed the BMC-MDP model that is able to model cluster based MDPs. We proposed C-PSRL, an algorithm, that consists of a single clustering step, on this newly introduced model. We compared its performance with a naïve approach and concluded that this new combined approach of clustering and MDP solving on a reduced space is a viable approach that reduces the computational complexity significantly. This research opened up the possibilities of more complex algorithms with, for example, multiple clustering steps. Moreover, if we can extend this clustering algorithm to clustering based on a state and action trajectory, this may results in an increased clustering performance and thereby enhance the performance of this general approach of optimizing on a cluster based MDP.

MSc thesis

If you are interested in reading his MSc thesis, you can find it on the TU Delft repository. Alternatively, it is available here:

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab


Looking for more?

If you are curious to related research on e.g. clustering or Block Markov Chains (BMCs), have a look at my list of Scientific Publications.

Interested in collaborating?

If you are interested in doing your MSc thesis with me, have a look the projects I have available, or simply contact me with your idea. If you are about to finish your MSc studies and would like to do a PhD, we may have a vacancy available here.

Jaron Sanders received in 2012 M.Sc. degrees in Mathematics and Physics from the Eindhoven University of Technology, The Netherlands, as well as a PhD degree in Mathematics in 2016. After he obtained his PhD degree, he worked as a post-doctoral researcher at the KTH Royal Institute of Technology in Stockholm, Sweden. Jaron Sanders then worked as an assistant professor at the Delft University of Technology, and now works as an assistant professor at the Eindhoven University of Technology. His research interests are applied probability, queueing theory, stochastic optimization, stochastic networks, wireless networks, and interacting (particle) systems.