Skip to content

Bellman equations, characterizing optimal policies🔗

The previous class (RL1) introduced the model of Markov Decision Processes as a way to describe discrete-time, stochastic, dynamical systems. Our focus is on controling such systems. For this we want to characterize what makes a policy optimal and how to find it. This class covers the model-based resolution of MDPs, in particular via Dynamic Programming.

During class we will cover sections 1 to 5 of the notebook. Sections 6 and 7 are extra content that is important for a better understanding of the concepts at stake but will not be covered in class and will not be directly reused in future classes.

Notebook