01.02.2023, 14:15 Uhr
– Raum 2.09.2.22 und Zoom, Public Viewing im Raum 2.09.0.17
Dr. Siegfried Beckus (UP)
Sean Meyn, University of Florida (USA)
The goal of actor critic methods is to estimate the best policy among a parameterized family for a controlled Markov chain. Through the magic of Markov chain theory, it is possible to obtain unbiased estimates of the objective through the geometry of TD-learning. These algorithms were born from the dissertations of Van Roy and Konda in the 1990s, under the supervision of Tsitsiklis at MIT.
The lecture will consist of two parts. Part 1 is an introduction to the TD(1) algorithm, that is one part of the actor-critic method. The elegant theory is accompanied by a significant warning: while the algorithm solves a projection problem, it is a Monte-Carlo method that can come with massive variance. Part 2 is an introduction to the actor critic algorithm, and the crucial role of the TD(1) algorithm. It seems likely that the variance can be tamed in these algorithms, but this remains a research frontier.
Sean Meyn will join our SFB Kick-off meeting online from the US. We will meet in house 27 to listen to the talk collectively.