2MMS50 – Stochastic Decision Theory

This page was last updated for the Academic Year 2019-2020.

Aims / Summary

The goal of the course is to familiarize students with the mathematical concepts and computational techniques for stochastic decision and optimization problems, and illustrate the application of these methods in various scenarios. The methodological framework of Markov decision processes and stochastic dynamic programming models will play a central role, and the students are expected to obtain knowledge of the main problem formulations and be able to apply the the main computational approaches in that domain to stylized problems. 

You will learn of state-of-the-art approaches in:

  1. Markov Decision Theory
  2. Multi-Armed Bandits
  3. Reinforcement Learning
  4. Stochastic / Distributed / Robust Optimization


This course will be taught by prof.dr. Bert Zwart (TU/eCWI) and dr. Jaron Sanders (TU/epersonal).

Course material

We will be using the following material throughout the course:

If you want additional background, we recommend you these books:

We encourage you to examine these seminal papers, whose contributions will come by:

  • William Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika 25(3-4):285-294, 1933
  • Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society. 58(5):527-535, 1952
  • Gittins, Bandit Processes and Dynamic Allocation Indices, Journal of the Royal Statistical Society, 1979

Exam / Grading

In order to pass this course, you must hand in four homework assignments. We ask you to partner up and deliver your work in student groups of size two. Each homework assignment will cover one of the topics of the course, and we will hand one out after every 3rd lecture. You will then have two weeks to complete each assignment.

We will grade each of the four assignments. Each will count for 25% towards your final grade.  

Online lectures

All lectures and instructions will be taught digitally, using Canvas Conferences. We will be using Powerpoint slides predominantly, and at times, may use Microsoft Whiteboard. Be on time and present, digitally, and participate. We encourage you strongly to enable your camera and have a functional microphone for interactivity.

Canvas Conferences can be found in the menu to your left on our course page on canvas.tue.nl.

Course overview

Here is a week-by-week breakdown of the course:

WeekTopicProfessor ActivityDate
17Markov Decision TheoryBert Zwarttwo lecturesApril 22, 24
18Markov Decision Theory
New assignment: MDT
Bert Zwartone lecture, one instructionApril 29, May 1
19Multi-Armed BanditsJaron Sanderstwo lecturesMay 4, 8
20Multi-Armed Bandits
Deadline: MDT, May 14th
New assignment: MABs, May 15th
Jaron Sandersone lecture, one instructionMay 13, 15
21no class activities
22Reinforcement LearningJaron Sanderstwo lecturesMay 27, 29
23Reinforcement Learning
Deadline: MABs, June 4th
New assignment: RL
Jaron Sandersone lecture, one instructionJune 3, 5
24Stochastic / Distributed / Robust OptimizationBert Zwarttwo lecturesJune 10, 12
25Stochastic / Distributed / Robust Optimization
Deadline: RL, June 18th
New assignment: SDRO
Bert Zwartone lecture, one instructionJune 17, 19
26no class activities
27Deadline: SDRO,  July 2nd

Presentations’ slides

Here are my presentations’ slides:

Multi-Armed Bandits

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab


Reinforcement Learning

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab