Risk-Averse Dynamic Programming for Markov Decision Processes
We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models:a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we also develop a risk-averse policy iteration method and we prove its convergence. Finally, we propose a special version of the Newton method to solve a nonsmooth equation arising in the policy iteration method and we prove its convergence.