##
284218 NEW Capabilities for Large-Scale Models in Computational Biology

**NEW
CAPABILITIES FOR LARGE-SCALE MODELS IN COMPUTATIONAL BIOLOGY**

*Casey
S. Abbott and John D. Hedengren*

*Brigham
Young University*

*Provo,
Utah*

**Abstract**

*Introduction*

** **Advances in
biomedical research have been leading to an increase of experimental data to be
interpreted in the context of reaction pathways, molecular transport, and
population dynamics. Kinetic modeling is a common way to interpret this data
and is used in the pharmaceutical industry in developing clinical trials for
new medications [1]. These models are based on first principles such as mole
balances and kinetic reactions. Often in the development of the model there are
parameters and initial conditions that are costly or cannot be measured
directly through experimental procedures. These parameters are usually
estimated through the use of optimization techniques. It is reasonable to
believe that biological modeling's role will only increase as pharmaceutical
companies such as Pfizer look to scale back and be more focused, spending less
on R&D while expecting more results [2]. One company that sees biological
modeling as a key to the future is Vertex Pharmaceutical who is “working ... to develop improved models
that can be used to more rapidly identify and optimize lead molecules and drug
candidates than currently used methods” [3].

Another indicator of the growing interest of biological models is the large repository of biological models publically available in the Systems Biology Markup Language (SBML), which includes hundreds of contributions. Many models in this standard format for computational biology have detailed reaction metabolic pathways that describe biological systems, including cause and effect relationships in the human body. While simulations of these biological systems have been successfully applied for many years, the alignment to available measurements continues to be a challenge. Researchers are reporting that best available solution techniques continue to limit the size of the reconciliation of model and measurements to small and medium size problems. This limits the usefulness of the models due to the many assumptions and simplifications that are required in order for the optimizer to be able to perform parameter estimation.

This study investigates the ability of advanced process monitor (APM) software to estimate parameters of large-scale models. APM is proven optimization and control software that was developed in the petrochemical industry and utilizes an optimization technique known as the simultaneous approach. This approach shows promise in efficiently optimizing large models (thousands of variables and parameters) [4]. In this method, the model and optimization problem are solved simultaneously, as opposed to the traditional approach of solving the differential and algebraic equation (DAE) model sequentially. In the sequential approach, each iteration of the optimization requires the solution of the DAE model. Much of the recent development for the simultaneous approach is occurring in the petrochemical industry, where on-line process control applications require optimization of nonlinear models with many decision variables in the span of minutes. With the success of APM in the petrochemical industry in solving such models it is desired to see if similar results can be replicated in computational biology.

*Results*

The first step in testing APM abilities is to show that it can accurately simulate small biological models. To do this the results from the APM simulation were compared to the literature and MATLAB simulation values for a couple small models. The first model used was a basic model describing the concentration of HIV viruses over thirty days with nine parameters, three variables and three differential equations [5]. APM was successfully able to replicate the results published in the literature for this model. The next model that was used describes the dynamics of HIV infection of CD4+ T cells [6]. This model is a little larger with nine parameters, four variables, and five DAEs. This model was obtained from the BioModel Database and was manually converted to a format that could be used by APM. It was found that APM also accurately simulates this model and matched values from the literature and simulations in MATLAB.

Once it was shown that APM could simulate biological models accurately, the next step was to verify the parameter estimation capabilities of APM. This was done using a HIV model similar to those mentioned above. The objective function was set to minimize the absolute error between the model and synthetic data in order to perform the parameter estimation. Measurement noise of plus or minus 0.5 log order was added to the synthetic data to make it more realistic. All six parameters were estimated in order to verify that APM could find the correct parameter values. The parameters were started from several different starting points to insure the accuracy of APM over the design space. Figure 1 shows the concentration of HIV viruses from the synthetic data and the predicted model values using the estimated parameters. As seen in the figure, APM was able to correctly find the parameters that allowed the model to fit the synthetic data. With this same model it was shown that APM allows for parallel processing, allowing multiple parameter estimations to be run simultaneously.

Figure 1: Parameter estimation capabilities of APM to fit a model to synthetic data

Before the capability of APM's parameter estimation could be applied to large-scale biological models it was necessary to create an automatic conversion from SBML to a format usable by APM. This not only eliminates human error in the conversion process but it also allows for the quick evaluation of many publicly available models.

This conversion utility was used to automatically convert a model that describes the ErbB signaling pathways [7]. It is a large model with 225 parameters, 504 variables and 1331 DAEs. 75 initial conditions and rate constants were estimated out of the 229 identified by the sensitivity analysis. This was accomplished through simulated annealing and required 100 annealing runs and 24 hours on a 100-node cluster computer to obtain one good fit, on average.

APM is currently able to simulate the ErbB model but does not match literature values. However, it does properly show the dynamics of the values found in the literature. It is believed that the values do not match due to the limitations found in the conversion utility since it does not yet properly handle piecewise functions. Even if the literature values are not replicated exactly, parameter estimation can still be performed to show the contribution of APM. The current parameter values can be assumed to be the correct values. Then these values will be changed and APM will try to reproduce the assumed correct values through parameter estimation. Once again measurement error will be applied to the values that are being used to conduct the parameter estimation. Instead of using simulated annealing to estimate the parameters, a multi-start approach will be used. To accomplish this, the parameter values will be randomly varied up to plus or minus 2.5 log order from the prior value and then be optimized to minimize an objective function. To test the full capabilities of APM all of the parameters will be estimated. A large numbers of these runs will be performed and the results will be compared the base values. If it is found that the design space is too flat or if there are too many local optimums, other optimizing techniques will be considered such as simulated annealing or a genetic algorithm. Once this is complete, these results will be compared to those found in the paper. The time requirement to solve the parameter estimation will also be analyzed and compared to the traditional sequential approach. It is believed that APM will be able to significantly decrease the amount of time required to perform parameter estimation of large-scale biological models and allow for the use of large-scale models in the pharmaceutical industry.

**References**

1. Adiwijaya, B. S. Herrmann,
E. Hare, B. Kieffer, T. Lin, C. Kwong, A. D. Garg, Randle, J. C. R. Sarrazin, C. Zeuzem,
S. and Caron, P. R. (2010), “A Multi-Variant, viral dynamic model of genotype 1
HCV to assess the in vivo evolution of protease-inhibitor resistant variants,” *PLoS Comput. Biol.*, 6(4):e1000745.

2. Thomas, Katie (2012, May 1), “Pfizer Races to Reinvent
Itself,” *New York Times,* Retrieved
from

http://www.nytimes.com/2012/05/02/business/pfizer-profit-declines-19-after-loss-of-lipitor-patent.html?_r=2

3. Vertex (2011, January 19), Retrieved from

http://www.vrtx.com/a-network-of-minds/our-network.html

4. Biegler, L. T. (2007), “An overview of simultaneous
strategies for dynamic optimization,” *Chemical
Engineering and Processing: Process Intensification*, 46(11) pp. 1043 – 1053.

5. Nowark, M. and May, R. (2000), Virus Dynamics Mathematical Principles of Immunology and Virology. Oxford, New York: Oxford University Press.

6. Perelson, A. S. Kirschner, D. E. De Boer, R. (1993), “Dynamics
of HIV infection of CD4 + T cells,” *Math
Biosci, *March, pp. 81-125.

7. Chen, William W. Schoeberl, Birgit Jasper, Paul J.
Niepel, Ulrik B. Lauffenburger, Douglas A. and Sorger, Peter K. (2009),
“Input-output behavior of ErbB signaling pathways as revealed by a mass action
model trained against dynamic data,” *Molecular
Systems Biology,* 5, pp. 239.

**Extended Abstract:**File Uploaded

See more of this Group/Topical: Computing and Systems Technology Division