Optimal Combinatorial Library Design from a Computational Complexity Perspective

Wei Xie and Nick Sahinidis. Chemical and Biomolecular Engineering, University of Illinois, 600 South Mathews Avenue, Urbana, IL 61801

Designing proteins with novel or improved functions via combinatorial libraries involves first mutating or recombining existing sequences and then exploring the resultant sequences. Comparing to the traditional sequence design approaches that require detailed knowledge of protein structure and function, combinatorial library design approaches are more straightforward to implement, and prove quite successful in practice [1]. An indispensable component for a successful library design approach is a powerful computational method that balances diversity and activity. Although quite a few computational studies [2, 3] have been conducted in the past, no detailed computational complexity analysis exists analyzing the inherent difficulty of the proposed models, i.e., how efficiently they can be solved. Such analysis would provide important guidelines for the development of models and efficient algorithms for this problem.

In this talk, we analyze combinatorial library design from a computational complexity perspective, and reach several interesting conclusions. First, we show that existing models for combinatorial library design vary substantially in difficulty. While some models are fairly easy in that they admit low-order polynomial-time algorithms, other models are NP-hard and may demand exponential time to solve. In the course of this analysis, we discover a key mathematical property for the design of efficient algorithms for the general combinatorial library design problem. By exploiting this property, we design a new and much faster algorithm than the existing popular RASPP algorithm [2] for the sequence-independent site-directed chimeragenesis (SISDC) protocol, thus demonstrating the importance of this new problem property.


[1] F. H. Arnold, Combinatorial and computational challenges for biocatalyst design, Nature, 409:253257, 2001.

[2] J. B. Endelman, J. J. Silberg, Z-G. Wang and F. H. Arnold, Site-directed protein recombination as a shortest path problem, Protein Engineering: Design & Selection, 17(7):589594, 2004

[3] M. C. Saraf, A. Gupta and C. D. Maranas, Design of combinatorial protein libraries of optimal size, Proteins: Structure, function, and bioinformatics, 60:769777, 2005.