Data-driven network reconstruction of biological systems is an essential step towards extracting information from large volumes of biological data. There are several methods developed recently to reconstruct biological networks. However, to the best of our knowledge, no systematic and comprehensive studies have been carried out to compare different methods based on different properties of datasets in terms of their ability to handle noisy data, different types of noise, level of correlation/collinearity, size of the data set and incomplete data sets. In this study, we have compared three popular methodsÑprincipal component regression (PCR)[1], linear matrix inequalities (LMI) [2], and Least Absolute Shrinkage and Selection Operator (LASSO)[3] Ñ on both real/experimental and synthetic data sets. Each of these methods is a representative of a category of popular methods that can be found in the literature. Method of PCR is based on dimensionality reduction. In LASSO, the aim is to minimize the L-2 norm of the residual vector while satisfying a parsimony constraint on the parameters. In LMI, the goal is to minimize an L-infinity-norm of the residual vector, with the ability to simultaneously incorporate a priori knowledge into the optimization problem. We have used three different metrics to compare the performance of the methods: root-mean-squared-error (RMSE) in prediction, average fractional error in the value of estimated coefficients, and semi-binary evaluation metrics: accuracy, sensitivity, specificity, and the geometric mean of sensitivity and specificity. This comparison enables us to establish criteria for selection of an appropriate approach for network reconstruction based on a priori properties of experimental data. For example, while PCR is the fastest method, LASSO and LMI perform better in terms of accuracy, sensitivity and specificity. Both PCR and LASSO are better than LMI in terms of fractional error in the values of the computed parameters. These trade-offs suggest that more than one aspect of each method needs to be taken into account in selecting a methodology for network reconstruction.
References
See more of this Group/Topical: Computing and Systems Technology Division