281217 An Augmented Lagrangian Approach for Parallel Solution of NLP Problems On Graphics Processing Units

Wednesday, October 31, 2012: 1:58 PM
325 (Convention Center )
Yankai Cao, Chemical Engineering, Texas A&M University, College Station, TX and Carl D. Laird, Department of Chemical Engineering, Texas A&M University, College Station, TX

Large-scale nonlinear programming (NLP) has proven to be an effective framework for optimal process design and operation. As problem sizes continue to grow, the capabilities of a single workstation can become insufficient. This, coupled with the emergence of new concurrent computing architectures, drives the need for effective parallel algorithms.

Graphics processing units (GPUs) are emerging as massively parallel systems that offer a large degree of parallelism at a relatively low cost. However, compared with classic multiple-instruction-multiple-data (MIMD) the single-instruction-multiple-thread (SIMT) architecture of the modern GPU limits its direct application in many numerical applications. This architecture contains a large number of processing cores, however, groups of these cores must be concurrently executing the same instruction. This makes the architecture inappropriate for parallel direct factorization – the primary numerical workhorse of most nonlinear programming solvers.

In this paper, we propose an Augmented Lagrangian approach for parallel solution of large-scale nonlinear programming problems on a GPU. The algorithm is iterative at three levels. The first level replaces the original problem by a sequence of bound-constrained optimization problems using an Augmented Lagrangian method. Each of these bound-constrained problems is solved using a nonlinear interior-point method. Inside the interior-point method, the barrier subproblems are solved using a variation of Newton's method, where the linear system is solved using a preconditioned conjugate gradient method (PCG). If the PCG approach detects negative curvature, a diagonal modification of the Hessian is made to ensure positive definiteness. The primary advantage of this algorithm is that is allows use of the PCG method, instead of using direct factorization. The PCG method can be implemented efficiently on the GPU in parallel.

Finally, numerical experiments are reported using a set of problems from COPS test collection. We compare the computational results of our serial and parallel versions with the IPOPT nonlinear optimization package, and show that significant speedup (approximately an order of magnitude) is possible.


Extended Abstract: File Not Uploaded
See more of this Session: Advances In Optimization
See more of this Group/Topical: Computing and Systems Technology Division