An Introduction to Actor Critic Methods for Optimal Control

14.09.2021, 16:00 - 17:00  –

Sean Meyn, University of Florida (USA)

The goal of actor critic methods is to estimate the best policy among a parameterized family for a controlled Markov chain.   Through the magic of Markov chain theory, it is possible to obtain unbiased estimates of the objective through the geometry of TD-learning.  These algorithms were born from the dissertations of Van Roy and Konda in the 1990s, under the supervision of Tsitsiklis at MIT. 

The lecture will consist of two parts.  Part 1 is an introduction to the TD(1) algorithm, that is one part of the actor-critic method.   The elegant theory is accompanied by a significant warning:  while the algorithm solves a projection problem,  it is a Monte-Carlo method that can come with massive variance.  Part 2 is an introduction to the actor critic algorithm, and the crucial role of the TD(1) algorithm.  It seems likely that the variance can be tamed in these algorithms, but this remains a research frontier.


Sean Meyn will join our SFB Kick-off meeting online from the US. We will meet in house 27 to listen to the talk collectively.

zu den Veranstaltungen