Only recently researchers have implemented with great success the main concept of RTDP or forward dynamic programming combined with approximation techniques to high dimensional problems [4 ] ( namely resource allocation problems). Nonetheless even if the empirical results are impressive, such RTDP algorithms, if applied to such large problems with vast action space cannot guarantee convergence of the value functions.
The aim of the presentation to emphasize the design parameters of a Real Time Approximate Dynamic Programming (RTADP) [5] procedure and to demonstrate how one can tune them via extensive simulations, in order to achieve near-optimal performance in stochastic shortest path applications. The design of experiments for a different class of applications should be considered similar.
The structure of this paper is as follows: First, we specify the computational obstacles of Dynamic Programming and then we delineate a Real-Time Approximate Dynamic Programming (RTADP) methodology that circumvents those. Specifically, we will focus on explaining the design parameters that dictate the end policy performance derived from the RTADP algorithm. Last, we conclude to a generic design of experiment scheme that will allow us to trace optimal values for these design parameters.
References: 1. A. Barto, S.J. Bradtke and S.P. Singh, “Learning to act using Real Time Dynamic Programming ”. Artificial Intelligence 72(1), pp. 81-138 , 1995.
2. T. Smith and R. Simmons “Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic” American Association for Artificial Intelligence 2006
3. T. McMahan, H. B. Likhachev, M. and G. J. Gordon, 2005. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In Proc. of ICML.
4. W. B. Powell. Approximate Dynamic Programming for Operations Research: Solving the curses of dimensionality. www.castlelab.princeton.edu. (Email powell@princeton.edu for a link to a review copy), 2007.
5. N.E. Pratikakis, M. J. Realff, and J. H. Lee. Strategic capacity decisions in manufacturing using stochastic dynamic programming. Naval Research Logistics, Under Review.