Reinforcement Learning🔗
Reinforcement Learning module of the Machine Learning class at ISAE-Supaero.
Syllabus🔗
This class covers an introduction to Reinforcement Learning (RL) in 15 hours, over 5 sessions. It aims to provide both a solid theoretical foundation and a quick learning curve towards current Deep RL algorithms. It starts with the fundamental notions underlying RL: Markov Decision Processes, model-based resolution approaches including Dynamic Programming, sample-based resolution of the Bellman equation. This leads to the identification of the three bottomline challenges in RL: function approximation, the exploration/exploitation trade-off and the search for optimality. This provides perspective to the following classes that introduce methods designed to tackle these challenges, including Deep RL methods. By the end of the class, students should be able to understand the literature on RL, implement key algorithms, and anticipate the difficulties of applying RL to various problems.
Class material🔗
The class is split into a series of notebooks that serve as lecture material, textbook and exercice book.
Class schedule🔗
| Schedule | |||
|---|---|---|---|
| Introduction and MDPs | afternoon | 11/01/2027 | RL intuitions, robotics, Markov Decision Processes |
| Bellman equations and value functions | afternoon | 18/01/2027 | Bellman equations, characterizing and evaluating policies |
| Deep Q-Networks | afternoon | 25/01/2027 | value function approximation, Deep Q-Networks |
| Deep Q-Networks (lab) | afternoon | 01/02/2027 | hands-on implementation of DQN |
| Actor-Critic methods | afternoon | 08/02/2027 | policy gradients and actor-critic algorithms |
Introduction to Reinforcement Learning🔗
This class offers a "getting started" session on Reinforcement Learning. It lays the main intuitions that will be formalized in subsequent lectures, provides additional resources and software requirements.
Markov Decision Processes🔗
The previous class provided the key intuitions about RL. RL is about learning to control dynamic systems. This class provides an introduction to the model underlying all Reinforcement Learning theory and developments: Markov Decision Processes.
Bellman equations, characterizing optimal policies🔗
The previous class introduced the model of Markov Decision Processes as a way to describe discrete-time, stochastic, dynamical systems. Our focus is on controling such systems. For this we want to characterize what makes a policy optimal and how to find it. This class covers the model-based resolution of MDPs, in particular via Dynamic Programming.
During class we will cover sections 1 to 5 of the notebook. Sections 6 and 7 are extra content that is important for a better understanding of the concepts at stake but will not be covered in class and will not be directly reused in future classes.
Evaluating policies with samples🔗
The previous classes introduced MDPs and the Bellman equations (evaluation and optimality). These equations involve the MDP's model (transition and reward functions). We saw how to solve these equations using the model. In this class, we will investigate how one can aim to solve the evaluation equation with samples rather than with the model.
During class we will cover sections 1 to 6 of the notebook. Sections 7 to 10 are extra content that is important for a better understanding of the concepts at stake but will not be covered in class and will not be directly reused in future classes.
Solving the optimality equation with samples🔗
The last class showed we can learn a policy's value function using only interaction samples. In this class, we focus on solving the optimality equation and estimating optimal value functions and policies from interaction samples. We will cover temporal difference based algorithms such as Q-learning and SARSA.
Deep Q-Networks🔗
In the previous classes we saw that one could replace the model-based value iteration process by an approximate value iteration one. When the approximation is done by performing stochastic approximation, we obtain the Q learning algorithm. We saw it was straightforward to extend this to use experience replay memories and batch stochastic gradient descent. In this class, we combine the stochastic gradient descent approach with replay memories and represent Q as a neural network. This yields the Deep Q-Networks algorithm.
Additional resources🔗
Great books available online:
Reinforcement Learning, an introduction
Algorithms for Reinforcement Learning
An introduction to Deep Reinforcement Learning
FAQ on installing Gym for Mac users
The AlphaGo movie