Data-Driven Network Reconstruction of Biological Systems: Comparison of Statistical and Optimization-Based Methods

Friday, October 21, 2011: 8:50 AM
101 D (Minneapolis Convention Center)
Behrang Asadi, Department of Bioengineering, University of California, San Diego, La Jolla, CA, Mano R. Maurya, Department of Bioengineering, University of California San Diego, La Jolla, CA, Daniel M. Tartakovsky, Department of Mechanical and Aerospace Engineering, University of California, San Diego, La Jolla, CA and Shankar Subramaniam, Department of Bioengineering, San Diego Supercomputer Center, Department of Chemistry & Biochemistry, University of California San Diego, La Jolla, CA

Data-driven network reconstruction of biological systems is an essential step towards extracting information from large volumes of biological data. There are several methods developed recently to reconstruct biological networks. However, to the best of our knowledge, no systematic and comprehensive studies have been carried out to compare different methods based on different properties of datasets in terms of their ability to handle noisy data, different types of noise, level of correlation/collinearity, size of the data set and incomplete data sets. In this study, we have compared three popular methods—principal component regression (PCR)[1], linear matrix inequalities (LMI) [2], and Least Absolute Shrinkage and Selection Operator (LASSO)[3] — on both real/experimental and synthetic data sets. Each of these methods is a representative of a category of popular methods that can be found in the literature. Method of PCR is based on dimensionality reduction. In LASSO, the aim is to minimize the L-2 norm of the residual vector while satisfying a parsimony constraint on the parameters. In LMI, the goal is to minimize an L-infinity-norm of the residual vector, with the ability to simultaneously incorporate a priori knowledge into the optimization problem. We have used three different metrics to compare the performance of the methods: root-mean-squared-error (RMSE) in prediction, average fractional error in the value of estimated coefficients, and semi-binary evaluation metrics: accuracy, sensitivity, specificity, and the geometric mean of sensitivity and specificity. This comparison enables us to establish criteria for selection of an appropriate approach for network reconstruction based on a priori properties of experimental data. For example, while PCR is the fastest method, LASSO and LMI perform better in terms of accuracy, sensitivity and specificity. Both PCR and LASSO are better than LMI in terms of fractional error in the values of the computed parameters. These trade-offs suggest that more than one aspect of each method needs to be taken into account in selecting a methodology for network reconstruction.

 

References

1.            Pradervand, S., M.R. Maurya, and S. Subramaniam, Identification of signaling components required for the prediction of cytokine release in RAW 264.7 macrophages. Genome Biology, 2006. 7(2): p. R11.

2.            Cosentino, C., et al., Linear matrix inequalities approach to reconstruction of biological networks. IET Systems Biology, 2007. 1(3): p. 164-173.

3.            Tibshirani, R., Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B-Methodological, 1996. 58(1): p. 267-288.

 


Extended Abstract: File Not Uploaded
See more of this Session: Control In Medicine and Biology
See more of this Group/Topical: Computing and Systems Technology Division