Previous methods have been devised to document disparity between ChIP-chip and gene expression data (Gao, Boulesteix, Ruan). By using gene expression data under the assumption that transcription factor activity should correlate with target gene expression, Gao et al 2004 concluded that on average 42% of the binding targets identified by ChIP-chip data in Saccharomyces cerevisiae are not true regulatory targets. Following the same assumption, Boulesteix et al 2005 documented an environmental dependence in the false positive rate of ChIP-chip data (stress response: 27%, cell cycle: 68%). In addition, Ruan et al 2005 used decision trees to investigate how well ChIP-chip data predicts the up-/down-/unchanged expression of genes under stress and cell cycle conditions. While all these approaches have had success detecting instances when ChIP-chip and gene expression data are in agreement, they all make key assumptions that may not be valid under all circumstances. For Gao et al 2004 and Boulesteix et al 2005, the assumption of correlation between transcription factor activity and target gene expression may be valid for singly regulated genes, but may not hold true for genes controlled by multiple regulators. For Ruan et al 2005, the implicit assumption that transcriptional regulation is an on/off event from a basal state ignores any type of more complicated regulation, such as a meaningful spectrum of induced and repressed states. We have recently developed a method that can identify instances of ChIP-chip and gene expression data agreement, allows for un-correlation between transcription factor activity and target gene expression when the gene is controlled by more than one regulator, and allows differential gene expression to be more than an on/off event from a basal state.
Our approach utilizes Gibbs sampling, Bayesian statistics, robust regression, and concepts from Network Component Analysis (Liao et al 2003) to identify those genes that have ChIP-chip binding data and gene expression data that support one another. To demonstrate the utility of this concept we have analyzed data from S. cerevisiae from a variety of environmental conditions to provide a dynamic perspective of transcriptional regulation.
Boulesteix, A.L., Strimmer, K. (2005). Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach. Theoretical biology and Medical Modelling, 2:23.
Gao, F., Foat, B., Bussemaker, H. (2004). Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data. BMC Bioinformatics, 5:31.
Liao, J.C., Boscolo, R. Yang, Y.L.,Tran, L.M., Sabatti, C. Roychowdhury, V.P. (2003) Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci USA, 100(26):15522-7.
Ruan, J., Zhang, W. (2005). CAGER: classification analysis of gene expression regulation using multiple information sources. BMC Bioinformatics, 6(1):114.