On March 29th, 2019, Pascal Lagerweij defended his MSc thesis Reinforcement Learning in Block Markov Chains. His committee consisted of prof.dr.ir Piet Van Mieghem, dr. Matthijs Spaan, and myself (as daily supervisor). Pascal joined our research group Network Architectures and Services on May 22nd, 2018. His work builds further upon our work on Clustering in Block Markov Chains available here and here.
Reinforcement Learning in Block Markov Chains
Abstract
Nowadays, reinforcement learning algorithms on Markov decision processes (MDPs) face computational issues when the state space is large. To reduce this state space of a MDP several state aggregation, or clustering, methodologies have been applied. Recently, a new clustering algorithm has been proposed that is able to cluster states from a single block Markov chain. A block Markov chain is a Markov chain with blocks in its transition matrix that correspond to clusters. Our aim was to investigate the possible combination of state aggregation in reinforcement learning on MDPs with clustering of states on a block Markov chain. First, we investigated the clustering algorithm and its properties to see its performance with different parameters and trajectory length. We compared the statistical properties of a pure Markov chain and the mixed Markov chain generated by a MDP. Afterwards, we verified the performance of the clustering algorithm on this mixed Markov chain. We proposed the BMC-MDP model that is able to model cluster based MDPs. We proposed C-PSRL, an algorithm, that consists of a single clustering step, on this newly introduced model. We compared its performance with a naïve approach and concluded that this new combined approach of clustering and MDP solving on a reduced space is a viable approach that reduces the computational complexity significantly. This research opened up the possibilities of more complex algorithms with, for example, multiple clustering steps. Moreover, if we can extend this clustering algorithm to clustering based on a state and action trajectory, this may results in an increased clustering performance and thereby enhance the performance of this general approach of optimizing on a cluster based MDP.
MSc thesis
If you are interested in reading his MSc thesis, you can find it on the TU Delft repository. Alternatively, it is available here:
Looking for more?
If you are curious to related research on e.g. clustering or Block Markov Chains (BMCs), have a look at my list of Scientific Publications.
Interested in collaborating?
If you are interested in doing your MSc thesis with me, have a look the projects I have available, or simply contact me with your idea. If you are about to finish your MSc studies and would like to do a PhD, we may have a vacancy available here.