473173 Inferring Gene Regulatory Networks from Single Cell Expression Data
In this work, we focused on the inference of gene regulatory network (GRN) from single cell expression data. More specifically, we considered timestamped crosssectional expression datasets, consistent with time series measurements taken using Fluidigm Biomark^{©}platform. Recently, several algorithms have been published for such GRN inference based on Boolean networks (Chen et al. 2014; Moignard et al. 2015), stochastic modelling (Teles et al. 2013), gene coexpression/correlation (Kouno et al. 2013; Moignard et al. 2013; Pina et al. 2015), and nonlinear ordinary differential equation models (Ocone et al. 2015). But, the direct application of these algorithms to timestamped crosssectional datasets face a few challenges due to, for example, the requirement of dense time course data and high computational complexity that scales exponentially with the size of the network.
Here, we developed a novel method for inferring the GRN structure, called Sparse Network Inference For Single cell data (SNIFS). SNIFS produces a directed graph model of the GRN by analyzing the time evolution of the distribution of single cell gene expression levels. Briefly, the algorithm begins with the computation of the changes in single cell transcriptional expression distribution over time for each gene. By employing the KolmogorovSmirnov (KS) distribution distances (Massey 1951) between two subsequent time points, the GRN inference involves solving a linear regression problem of the type y=Xα. More specifically, the KS distance of a gene at each time step y is modelled as a linear function of the KS distances of all other genes at a previous time step X. SNIFS then uses the elasticnet regularization (Zou and Hastie 2005) to find the optimal (sparse) solution α by solving the following penalized least square optimization problem:
min yXα_{2}^{2} + λ(mα_{1} + (1m)α_{2}) subject to α_{j}≥0.
Note that by setting m to 1 or to 0 turns the elastic net regularization into Lasso or Tikhonov (ridge regression) regularization, respectively. In the implementation of SNIFS, we used GLMNET (r (Friedman et al. 2010) to solve for the optimal α.
We evaluated the performance of SNIFS by inferring 10 and 20gene random subnetworks of E. coli and yeast GRNs using in silicotimestamped crosssectional single cell expression datasets. Given the structure of the GRN, we generated single cell expression data by simulating a stochastic differential equation (SDE) model: (Pinna et al. 2010)
dx_{j} = V(β Π(1+α_{ij}x_{i}/(x_{i} + 1))  θx_{j}) + σx_{j}dW(t)
where x_{j} represents the mRNA level of gene j, α_{i,j} describes the regulation of the expression of gene j by gene i, β denotes the basal transcriptional rate, q is the mRNA degradation rate constant, and σ and V are scaling parameters. The variable dW(t) describes the random Wiener process, which accounted for intrinsic stochastic dynamics of the gene expression (Wilkinson 2009). We set α_{ij} to 1 for activation, to −1 for repression, and to 0 otherwise. For the main datasets in the case study, we further set the parameters to the following: V=30, β =1, q=0.2, and σ=0.1. In total, we generated single cell data for 8 equallyspaced time points between t = 0.1 and t = 2.
We assessed the accuracy of the GRN predictions by computing the area under the receiver operating characteristics (AUROC) and the precision recall (AUPR) curves. We compared the GRNs predicted by SNIFS with those predicted using the populationaveraged expression data by TSNI (Time Series Network Inference) (Bansal et al. 2006), and using a treebased ensemble regression method called GENIE3 (GEne Network Inference with Ensemble of trees) (HuynhThu et al. 2010). The averaged AUROC and AUPR values in Table 1 indicated that for any mvalues, SNIFS could significantly outperform the predictions of TSNI and GENIE3. This result demonstrated the advantage of considering information contained in the single cell distributional data for the purpose of GRN inference, as done in SNIFS.

Table 1. Evaluation of GRN Inference using TSNI, GENIE3, and SNIFS


10GENE NETWORK 
20GENE NETWORK


AUROC 
AUPR 
AUROC 
AUPR 

m 
TSNI 
GENIE3 
SNIFS 
TSNI 
GENIE3 
SNIFS 
TSNI 
GENIE3 
SNIFS 
TSNI 
GENIE3 
SNIFS 
0 (Ridge) 
0.41 
0.48 
0.75 
0.10 
0.14 
0.31 
0.41 
0.50 
0.63 
0.06 
0.07 
0.15 
0.1 
0.41 
0.48 
0.76 
0.10 
0.14 
0.31 
0.41 
0.50 
0.68 
0.06 
0.07 
0.19 
0.2 
0.41 
0.48 
0.73 
0.10 
0.14 
0.29 
0.41 
0.50 
0.66 
0.06 
0.07 
0.20 
0.3 
0.41 
0.48 
0.70 
0.10 
0.14 
0.28 
0.41 
0.50 
0.66 
0.06 
0.07 
0.21 
0.4 
0.41 
0.48 
0.67 
0.10 
0.14 
0.27 
0.41 
0.50 
0.65 
0.06 
0.07 
0.22 
0.5 
0.41 
0.48 
0.65 
0.10 
0.14 
0.25 
0.41 
0.50 
0.64 
0.06 
0.07 
0.22 
0.6 
0.41 
0.48 
0.63 
0.10 
0.14 
0.25 
0.41 
0.50 
0.64 
0.06 
0.07 
0.23 
0.7 
0.41 
0.48 
0.61 
0.10 
0.14 
0.25 
0.41 
0.50 
0.63 
0.06 
0.07 
0.23 
0.8 
0.41 
0.48 
0.61 
0.10 
0.14 
0.26 
0.41 
0.50 
0.62 
0.06 
0.07 
0.23 
0.9 
0.41 
0.48 
0.60 
0.10 
0.14 
0.25 
0.41 
0.50 
0.61 
0.06 
0.07 
0.23 
1 (Lasso) 
0.41 
0.48 
0.58 
0.10 
0.14 
0.25 
0.41 
0.50 
0.60 
0.06 
0.07 
0.24 
REFERENCES
Bansal, M., Gatta, G. Della and di Bernardo, D. (2006). Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 22(7), pp.815–822.
Chen, H. et al. (2014). Singlecell transcriptional analysis to uncover regulatory circuits driving cell fate decisions in early mouse development. Bioinformatics, 31(7), pp.1060–1066.
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software, 33(1), pp.1–22.
HuynhThu, V.A. et al. (2010). Inferring regulatory networks from expression data using treebased methods. PloS one, 5(9), p.e12776.
Kouno, T. et al. (2013). Temporal dynamics and transcriptional control using singlecell gene expression analysis. Genome biology, 14(10), p.R118.
Massey, F.J. (1951). The KolmogorovSmirnov Test for Goodness of Fit. Journal of the American Statistical Association, 46(253), pp.68 – 78.
Moignard, V. et al. (2013). Characterization of transcriptional networks in blood stem and progenitor cells using highthroughput singlecell gene expression analysis. Nature cell biology, 15(4), pp.363–72.
Moignard, V. et al. (2015). Decoding the regulatory network of early blood development from singlecell gene expression measurements. Nature Biotechnology, advance on(3).
Ocone, a. et al. (2015). Reconstructing gene regulatory dynamics from highdimensional singlecell snapshot data. Bioinformatics, 31(12), pp.i89–i96.
Pina, C. et al. (2015). SingleCell Network Analysis Identifies DDIT3 as a Nodal Lineage Regulator in Hematopoiesis. Cell reports, 11(10), pp.1503–10.
Pinna, A., Soranzo, N. and de la Fuente, A. (2010). From knockouts to networks: establishing direct causeeffect relationships through graph analysis. PloS one, 5(10), p.e12912.
Sandberg, R. (2013). Entering the era of singlecell transcriptomics in biology and medicine. Nature Methods, 11(1), pp.22–24.
Teles, J. et al. (2013). Transcriptional regulation of lineage commitmenta stochastic model of cell fate decisions. PLoS computational biology, 9(8), p.e1003197.
Wilkinson, D.J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nature reviews. Genetics, 10(2), pp.122–33.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp.301–320.
See more of this Group/Topical: Computing and Systems Technology Division