Speaker: Venkat Anantharam
Title: A relative entropy characterization of the growth rate of reward in risk-sensitive control
Abstract: We study the infinite horizon risk-sensitive control problem for discrete time Markov decision processes with compact metric state and action spaces. We derive a variational formula for the optimal growth rate of risk-sensitive reward. This parallels the usual variational formulation of the long term average reward in the absence of risk-sensitivity given by the ergodic control viewpoint, with an additional relative entropy penalty on occupation measures. It can also be viewed as an extension of the characterization of Donsker and Varadhan for the Perron-Frobenius eigenvalue of a positive operator. The problem of determining this optimal growth rate of risk-sensitive reward is thereby presented as a problem of maximizing a concave function over a convex set. (Joint work with Vivek Borkar, IIT Bombay.)