Tuesday, November 6, 2007 - 8:30 AM
149a

Solving The Stochastic Shortest Path Problem: Exploring Via Extensive Simulations The Usage Of A Real Time Approximate Dynamic Programming Approach

Nikolaos Pratikakis, Chemical and Biomolecular Engineering, Georgia Institute of Technology, Ford Environmental Science & Technology Building 311 Ferst Drive, N.W., Atlanta, GA 30332, Jay H. Lee, School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, 311 Ferst Dr. NW, Atlanta, GA 30332-0100, and Matthew J. Realff, Chemical and Biomolecular Engineering, Georgia Tech, 311 Ferst Drive, N.W., Atlanta, GA 30332-0100.

Real Time Dynamic Programming (RTDP) [1] was first demonstrated on shortest path optimal control problems. Since then, there have been numerous tries to enhance the convergence rate of the algorithm [2,3]. The most recent RTDP [2] algorithm maintains upper and lower bounds for the value functions of each state. Doing so, they manage to balance the trade off between exploring for new states and exploiting for a better value function estimation for the bounds. In fact, RTDP procedures are very effective for large stochastic shortest path problems. But the questions remains: Can we apply RTDP concepts to realistic stochastic problems?

Only recently researchers have implemented with great success the main concept of RTDP or forward dynamic programming combined with approximation techniques to high dimensional problems [4 ] ( namely resource allocation problems). Nonetheless even if the empirical results are impressive, such RTDP algorithms, if applied to such large problems with vast action space cannot guarantee convergence of the value functions.

The aim of the presentation to emphasize the design parameters of a Real Time Approximate Dynamic Programming (RTADP) [5] procedure and to demonstrate how one can tune them via extensive simulations, in order to achieve near-optimal performance in stochastic shortest path applications. The design of experiments for a different class of applications should be considered similar.

The structure of this paper is as follows: First, we specify the computational obstacles of Dynamic Programming and then we delineate a Real-Time Approximate Dynamic Programming (RTADP) methodology that circumvents those. Specifically, we will focus on explaining the design parameters that dictate the end policy performance derived from the RTADP algorithm. Last, we conclude to a generic design of experiment scheme that will allow us to trace optimal values for these design parameters.

References: 1. A. Barto, S.J. Bradtke and S.P. Singh, “Learning to act using Real Time Dynamic Programming ”. Artificial Intelligence 72(1), pp. 81-138 , 1995.

2. T. Smith and R. Simmons “Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic” American Association for Artificial Intelligence 2006

3. T. McMahan, H. B. Likhachev, M. and G. J. Gordon, 2005. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In Proc. of ICML.

4. W. B. Powell. Approximate Dynamic Programming for Operations Research: Solving the curses of dimensionality. www.castlelab.princeton.edu. (Email powell@princeton.edu for a link to a review copy), 2007.

5. N.E. Pratikakis, M. J. Realff, and J. H. Lee. Strategic capacity decisions in manufacturing using stochastic dynamic programming. Naval Research Logistics, Under Review.