This page was last updated for the Academic Year 2020-2021.
Aims / Summary
The goal of the course is to familiarize students with the mathematical concepts and computational techniques for stochastic decision and optimization problems, and illustrate the application of these methods in various scenarios. The methodological framework of Markov decision processes and stochastic dynamic programming models will play a central role, and the students are expected to obtain knowledge of the main problem formulations and be able to apply the the main computational approaches in that domain to stylized problems.
You will learn of state-of-the-art approaches in:
- Markov Decision Theory
- Multi-Armed Bandits
- Reinforcement Learning
- Stochastic / Distributed / Robust Optimization
Lecturers / Contact
This course will be taught by prof.dr. Bert Zwart (TU/e, CWI) and dr. Jaron Sanders (TU/e, personal). Mike van Santvoort is our instructor.
If you have a question, then you can contact us in the following ways:
- Ask questions during the live videolectures on Microsoft Teams (Links to an external site.).
- Post your question in the Discussions forum. This will benefit other students too.
- Send us an e-mail if it’s low priority.
Course material
We will be using the following material throughout the course:
- Lecture notes Advanced Stochastic Operations Research: Stochastic Decision Theory, 2009
- Sebastian Bubeck and Nicolo Cesa-Bianchi’s lecture notes Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, 2012
- Gallego’s lecture notes on newsvendor problem download
- Tuturial on Stochastic programming by Philpott and Shapiro
If you want additional background, we recommend you these books:
- Richard Sutton and Andrew Barto, Reinforcement Learning: An introduction
- Dimitri Bertsekas and John Tsitsiklis book Neuro-dynamic programming, 1996
We encourage you to examine these seminal papers, whose contributions will come by:
- William Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika 25(3-4):285-294, 1933
- Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society. 58(5):527-535, 1952
- Gittins, Bandit Processes and Dynamic Allocation Indices, Journal of the Royal Statistical Society, 1979
Exam / Grading
In order to pass this course, you must hand in four homework assignments. We ask you to partner up and deliver your work in student groups of size two. Each homework assignment will cover one of the topics of the course, and we will hand one out after every 3rd lecture. You will then have two weeks to complete each assignment.
We will grade each of the four assignments. Each will count for 25% towards your final grade.
Online lectures
All lectures and instructions will be taught digitally, using Microsoft Teams. We have created a team for 2MMS50. We will organize live sessions in the General channel.
We will be using Powerpoint slides predominantly, and at times, may use Microsoft Whiteboard. Be on time and present, digitally, and participate. We encourage you strongly to enable your camera and have a functional microphone for interactivity.
Course overview
Here is a week-by-week breakdown of the course:
Week | Topic | Professor | Activity | Date |
16 | Markov Decision Theory | Bert Zwart | two lectures | April 19, 22 |
17 | Markov Decision Theory | Bert Zwart | one lecture | April 29 |
18 | Markov Decision Theory New assignment: MDT, May 3rd | Bert Zwart | one instruction | May 3 |
Multi-Armed Bandits | Jaron Sanders | mixed lecture, instruction | May 6 | |
19 | Multi-Armed Bandits | Jaron Sanders | mixed lecture, instruction | May 10 |
20 | Multi-Armed Bandits Deadline: MDT, May 19th New assignment: MABs, May 20th | Jaron Sanders | mixed lectures, instructions | May 17, 20 |
21 | Reinforcement Learning | Jaron Sanders | mixed lectures, instructions | May 24, 27 |
22 | Reinforcement Learning Deadline: MABs, June 2nd New assignment: RL, June 3rd | Jaron Sanders | mixed lectures, instructions | May 31, June 3 |
23 | Stochastic / Distributed / Robust Optimization | Bert Zwart | two lectures | June 7, 10 |
24 | Stochastic / Distributed / Robust Optimization Deadline: RL, June 16th New assignment: SDRO, June 17th | Bert Zwart | one lecture, one instruction | June 14, 17 |
25 | Deadline: SDRO, June 30th |
Presentations’ slides
Here are my presentations’ slides of academic year 2019-2020: