Deep Q-Networks🔗
In the previous classes we saw that one could replace the model-based value iteration process by an approximate value iteration one. When the approximation is done by performing stochastic approximation, we obtain the Q learning algorithm. We saw it was straightforward to extend this to use experience replay memories and batch stochastic gradient descent. In this class, we combine the stochastic gradient descent approach with replay memories and represent Q as a neural network. This yields the Deep Q-Networks algorithm.