A fast and robust statistical test based on Likelihood ratio with Bartlett correction to identify Granger causality between gene sets

Andre Fujita, Kaname Kojima, Alexandre G. Patriota, Joao R. Sato, Patricia Severino, Satoru Miyano

Summary: We propose a likelihood ratio test (LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrap-based approach. LRT is shown to be significantly faster and statistically powerful even within non-Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests, is also provided.

Contact: andrefujita AT riken DOT jp

Installation: In order to install gGranger, download the appropriate file below and type the following command at your terminal console: R CMD INSTALL <file name>

Original paper (Identification of Granger causality between gene sets)

Paper (preprint): grangerforgroups.pdf DOI:10.1142/S0219720010004860

Supplementary files

Simulation 1 (Multivariate model): This simulation illustrates a multivariate (standard) case where nine (predictor) genes Granger cause one (target) gene. The weights are at the edges of the network. Gene x_{9} does not Granger cause gene y_{t}. (Figure 1: figure1suppl.pdf)

Simulation 2 (Module-module model): This simulation illustrates a module-module case where a set of genes Granger cause another set of genes. The details about this simulation is explained in the original paper simulation 1 of grangerforgroups.pdf. (Figure 2: figure2suppl.pdf)

Simulation 2 with different noise distribution

The following figures illustrate the p-value distributions for both, the bootstrap and the Likelihood Ratio Test procedures under the null hypothesis (set I to set III; set II to set I; set III to set I and set III to set III), and the ROC curves for simulation 2. Full lines represent the LRT while dashed lines are the ROC curves for the bootstrap. The results show that both LRT and bootstrap procedures control effectively the rate of false positives (p-values histograms close to uniform distributions)

- Simulations with Gaussian noise (N(0,1)).

Time series lengthBootstrap testLikelihood ratio testROC curve
75pvaluebootfp-75.pdfpvaluelikefp-75.pdfroc-75.pdf
100pvaluebootfp-100.pdfpvaluelikefp-100.pdfroc-100.pdf

- Simulations with Uniform noises (U(-0.5,0.5)).

Time series lengthBootstrap testLikelihood ratio testROC curve
200pvaluebootfp-uniform.pdfpvaluelikefp-uniform.pdfroc-uniform.pdf

- Simulations with Exponential noises (with mean Exp(1)-1).

Time series lengthBootstrap testLikelihood ratio testROC curve
75pvaluebootfp-75exp.pdfpvaluelikefp-75exp.pdfroc-75exp.pdf

- Simulations with Gamma noises (Gamma(1,1)-1).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-gamma.pdfpvaluelikefp-gamma.pdfroc-gamma.pdf

- Simulations with half-normal noises (abs(N(0,1))-sqrt(1/Pi)).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-halfnormal.pdfpvaluelikefp-halfnormal.pdfroc-halfnormal.pdf

- Simulations with t-Student noises (d.f.=3).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-tstudentdf3.pdfpvaluelikefp-tstudentdf3.pdfroc-tstudentdf3.pdf

- Simulations with t-Student noises (d.f.=7).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-tstudentdf7.pdfpvaluelikefp-tstudentdf7.pdfroc-tstudentdf7.pdf

- Simulations with a multivariate t-Student noises (d.f.=3).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-tstudentxxxdf3.pdfpvaluelikefp-tstudentxxxdf3.pdfroc-tstudentxxxdf3.pdf

Verifying the control of type I error in actual biological data

In order to verify if LRT can control the rate of false positives in real biological data, we selected the same genes used in (grangerforgroups.pdf) and permuted the values of the time series, in order to eliminate any existing Granger causality among them. Then, LRT was carried out in order to identify Granger causality between gene sets. In hela-random.pdf one can observe that the type I error is effectively controlled by LRT since all the p-value histograms are close to uniform distributions (under the null hypothesis). This experiment was done 10,000 times.

Application to biological data

The following figure (hela-network.pdf) illustrates the application of LRT and Bartlett correction for the identification of Granger causality between sets of genes. The coefficients estimated by the method are at the edges. Solid lines represent statistically significant Granger causalities with p-value < 0.05 and dashed lines are p-value < 0.10.

One out of every two men and one out of every three women will develop cancer during their lifetime (American Cancer Society, 2008). Proto-oncogenes constitute a group of genes that cause normal cells to become cancerous when they are mutated (Weinstein & Joe, 2006). Proto-oncogenes encode proteins that function to stimulate cell division, inhibit cell differentiation or halt cell death. All of these processes are important for normal human development. Oncogenes, the mutated version of proto-oncogenes, typically lead to increased cell division, decreased cell differentiation, and inhibition of cell death; taken together, these phenotypes define cancer cells. Three proto-oncogenes were chosen in a time series dataset (Whitfield et al., 2000): C-MYC, C-JUN and C-FOS. These genes have been shown to concomitantly participate in important processes such as tissue regeneration (Morello et al, 1990), stress response (Buhk et al., 1990) and carcinogenesis (Yuen et al, 2001). A more comprehensive knowledge on how these genes interact with each other and with other cellular systems should contribute to the understanding of such processes. As expected from published data, Figure (hela-network.pdf) shows that the expression of these genes is correlated in time. They seem to be also timely correlated with three groups of genes that play important roles in cancer: FGF1-FRAG1-FGF12B; TP53-P21-GADD45A and IL1B-IL6-IL8. Fibroblast growth factor (FGF) family members possess broad mitogenic and cell survival activities. Three genes from this family came up in our analysis: fibroblast growth factor 1 (FGF1), which has potent biological activities implicated in cancer development, and FGF12B and FRAG1, which do not have any specific function determined and have been poorly studied so far (Lorenzi et al., 1996). Their insertion in the system presented here should help in the design of experiments aiming to better understand their contribution to this family.

FGF and p53 pathways may interact in the cell to determine cell fate (Bouleau et al, 2005). Deregulation of one of these pathways modifies the balance between cell proliferation and cell death and may lead to tumor progression. Our data correlated p53 to p21, a major transcriptional target for p53, and to GADD45A, a growth arrest and DNA-damage gene that acts in a p53 dependent manner (Ji et al., 2007). According to the network presented in this study, both sets of genes (FGF family members and p53-regulated genes) are associated with the three selected proto-oncogenes in a timely manner. Worth noting is that c-myc is a known inducer of wild type p53, decreased c-myc expression may lead to uncontrolled cell growth because of the lack of p53 expression that normally induces apoptosis

Finally, an association between interleukins IL6, IL8 and ILB and proto-oncogenes, often at the expression level, has been demonstrated for cellular processes, such as cell cycle regulation, angiogenesis and proliferation (Nabata et al., 1990; Resnitzky et al, 1991; Krishnamoorthy et al., 2000; Shchors et al., 2006). Taken together, the results presented here show that the interaction between known pathways and well-studied cellular processes might work at levels not yet explored. Even though interactions are probably not direct, the network shows how sets of genes which have been independently studied in the context of cancer might collaborate for a given phenotype.

References:

American Cancer Society. Cancer Facts and Figures 2008.

Bouleau S., Grimal H., Rincheval V., Godefroy N., Mignotte B., Vayssière J-L., Renaud F. FGF1 inhibits p53-dependent apoptosis and cell cycle arrest via an intracrine pathway. Oncogene. 24:7839–7849, 2005.

Bukh A., Martinez-Valdez H., Freedman S.J., Freedman M.H. Cohen A. The expression of c-fos, c-jun, and c-myc genes is regulated by heat shock in human lymphoid cells. The Journal of Immunology. 144:4835-4840, 1990.

Ji J., Liu R., Tong T., Song Y., Jin S., Wu M., Zhan Q. Gadd45a regulates β-catenin distribution and maintains cell–cell adhesion/contactGadd45a induces β-catenin distribution. Oncogene. 26:6396-6405, 2007

Krishnamoorthy B., Narayanan K., Miyamoto S., Balakrishnan A. Epithelial cells release proinflammatory cytokines and undergo c-Myc-induced apoptosis on exposure to filarial parasitic sheath protein-Bc12 mediates rescue by activating c-H-Ras. In Vitro Cellular & Developmental Biology. Animal. 36: 532-538, 2000.

Lorenzi M.V., Horii Y., Yamanaka R., Sakaguchi K., Miki T. FRAG1, a gene that potently activates fibroblast growth factor receptor by C-terminal fusion through chromosomal rearrangement. PNAS. 93: 8956-8961, 1996.

Morello D., Fitzgerald M.J., Babinet C., Fausto N. c-myc, c-fos, and c-jun regulation in the regenerating livers of normal and H-2K/c-myc transgenic mice. Mol Cell Biol. 10:3185-3193
, 1990.

Nabata T., Morimoto S., Koh E., Shiraishi T., Ogihara T. Interleukin-6 stimulates c-myc expression and proliferation of cultured vascular smooth muscle cells. Biochem Int. 20:445-53, 1990.

Resnitzky D. and Kimchi A. Deregulated c-myc expression abrogates the interferon- and interleukin 6-mediated G0/G1 cell cycle arrest but not other inhibitory responses in M1 myeloblastic cells. Cell Growth & Differentiation. 2: 33-41.

Shchors K., Shchors E., Rostker F., Lawlor E.R., Brown-Swigart L., Evan G.I. The Myc-dependent angiogenic switch in tumors is mediated by interleukin 1. Genes Dev. 20: 2527–2538, 2006.

Weinstein I.B. and Joe A.K. Mechanisms of disease: Oncogene addiction—a rationale for molecular targeting in cancer therapy. Nature Clinical Practice Oncology. 3:448–457, 2006.

Whitfield M.L., Sherlock G., Saldanha A.J., Murray J.I., Ball C.A., Alexander K.E., Matese J.C., Perou C.M., Hurt M.M., Brown P.O., Botstein D. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell. 13:1977-2000, 2000.

Yuen M.F., Wu P.C., Lai V.C., Lau J.Y., Lai C.L. Expression of c-Myc, c-Fos, and c-jun in hepatocellular carcinoma. Cancer. 91:106-12, 2001.

ggranger.txt · Last modified: 2010/06/19 06:41 by mlabadm
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0