Computing factored value functions for policies in structured MDPs (1999)by D. Koller and R. Parr
Many large Markov decision processes (MDPs) can be represented compactly using a structured representation such as a dynamic Bayesian network. Unfortunately, the compact representation does not help standard MDP algorithms, because the value function for the MDP does not retain the structure of the process description. We argue that in many such MDPs, structure is approximately retained. That is, the value functions are nearly additive: closely approximated by a linear function over factors associated with small subsets of problem features. Based on this idea, we present a convergent approximate value determination algorithm for structured MDPs. The algorithm maintains an additive value function, alternating dynamic programming steps with steps that project the result back into the restricted space of additive functions. We show that both the dynamic programming and the projection steps can be computed efficiently, despite the fact that the number of states is exponential in the number of state variables.
D. Koller and R. Parr (1999). "Computing factored value functions for policies in structured MDPs." Proc. Sixteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1332-1339).
author = "D. Koller and R. Parr",
booktitle = "Proc. Sixteenth International Joint Conference on
Artificial Intelligence (IJCAI)",
title = "Computing factored value functions for policies in
pages = "1332--1339",
year = "1999",