Evaluating policies with samples🔗

The previous classes introduced MDPs and the Bellman equations (evaluation and optimality). These equations involve the MDP's model (transition and reward functions). We saw how to solve these equations using the model. In this class, we will investigate how one can aim to solve the evaluation equation with samples rather than with the model.

During class we will cover sections 1 to 6 of the notebook. Sections 7 to 10 are extra content that is important for a better understanding of the concepts at stake but will not be covered in class and will not be directly reused in future classes.

Notebook