Solving the optimality equation with samples🔗
The last class showed we can learn a policy's value function using only interaction samples. In this class, we focus on solving the optimality equation and estimating optimal value functions and policies from interaction samples. We will cover temporal difference based algorithms such as Q-learning and SARSA.