This page was last updated for the Academic Year 2019-2020.
Aims / Summary
The goal of the course is to familiarize students with the mathematical concepts and computational techniques for stochastic decision and optimization problems, and illustrate the application of these methods in various scenarios. The methodological framework of Markov decision processes and stochastic dynamic programming models will play a central role, and the students are expected to obtain knowledge of the main problem formulations and be able to apply the the main computational approaches in that domain to stylized problems.
You will learn of state-of-the-art approaches in:
- Markov Decision Theory
- Multi-Armed Bandits
- Reinforcement Learning
- Stochastic / Distributed / Robust Optimization
We will be using the following material throughout the course:
- Lecture notes Advanced Stochastic Operations Research: Stochastic Decision Theory, 2009
- Sebastian Bubeck and Nicolo Cesa-Bianchi’s lecture notes Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, 2012 (Links to an external site.)
If you want additional background, we recommend you these books:
- Richard Sutton and Andrew Barto, Reinforcement Learning: An introduction
- Dimitri Bertsekas and John Tsitsiklis book Neuro-dynamic programming, 1996
We encourage you to examine these seminal papers, whose contributions will come by:
- William Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika 25(3-4):285-294, 1933
- Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society. 58(5):527-535, 1952
- Gittins, Bandit Processes and Dynamic Allocation Indices, Journal of the Royal Statistical Society, 1979
Exam / Grading
In order to pass this course, you must hand in four homework assignments. We ask you to partner up and deliver your work in student groups of size two. Each homework assignment will cover one of the topics of the course, and we will hand one out after every 3rd lecture. You will then have two weeks to complete each assignment.
We will grade each of the four assignments. Each will count for 25% towards your final grade.
All lectures and instructions will be taught digitally, using Canvas Conferences. We will be using Powerpoint slides predominantly, and at times, may use Microsoft Whiteboard. Be on time and present, digitally, and participate. We encourage you strongly to enable your camera and have a functional microphone for interactivity.
Canvas Conferences can be found in the menu to your left on our course page on canvas.tue.nl.
Here is a week-by-week breakdown of the course:
|17||Markov Decision Theory||Bert Zwart||two lectures||April 22, 24|
|18||Markov Decision Theory|
New assignment: MDT
|Bert Zwart||one lecture, one instruction||April 29, May 1|
|19||Multi-Armed Bandits||Jaron Sanders||two lectures||May 4, 8|
Deadline: MDT, May 14th
New assignment: MABs, May 15th
|Jaron Sanders||one lecture, one instruction||May 13, 15|
|21||no class activities|
|22||Reinforcement Learning||Jaron Sanders||two lectures||May 27, 29|
Deadline: MABs, June 4th
New assignment: RL
|Jaron Sanders||one lecture, one instruction||June 3, 5|
|24||Stochastic / Distributed / Robust Optimization||Bert Zwart||two lectures||June 10, 12|
|25||Stochastic / Distributed / Robust Optimization|
Deadline: RL, June 18th
New assignment: SDRO
|Bert Zwart||one lecture, one instruction||June 17, 19|
|26||no class activities|
|27||Deadline: SDRO, July 2nd|
Here are my presentations’ slides: