This shows you the differences between two versions of the page.
ggranger [2010/06/05 00:58] mlabadm |
ggranger [2010/06/19 06:41] (current) mlabadm |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | =====Testing Granger causality between sets of genes===== | + | =====A fast and robust statistical test based on Likelihood ratio with Bartlett correction to identify Granger causality between gene sets===== |
- | Andre Fujita, Kaname Kojima, Alexandre G. Patriota, Joao R. Sato, Satoru Miyano | + | Andre Fujita, Kaname Kojima, Alexandre G. Patriota, Joao R. Sato, Patricia Severino, Satoru Miyano |
- | **Summary**: We propose a likelihood ratio test with a Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a bootstrap-based approach, for which the former demonstrated to be much faster and with higher statistical power. An R package named gGranger, which contains an implementation of the Granger causality identification model by using both, the bootstrap and the likelihood ratio tests, is also provided. | + | **Summary**: We propose a likelihood ratio test (LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrap-based approach. LRT is shown to be significantly faster and statistically powerful even within non-Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests, is also provided. |
**Contact**: andrefujita AT riken DOT jp | **Contact**: andrefujita AT riken DOT jp | ||
Line 15: | Line 15: | ||
**Original paper (Identification of Granger causality between gene sets)** | **Original paper (Identification of Granger causality between gene sets)** | ||
- | Paper (preprint): {{:grangerforgroups.pdf|}} | + | Paper (preprint): {{:grangerforgroups.pdf|}} DOI:10.1142/S0219720010004860 |
**Supplementary files** | **Supplementary files** | ||
Line 21: | Line 21: | ||
**Simulation 1 (Multivariate model):** This simulation illustrates a multivariate (standard) case where nine (predictor) genes Granger cause one (target) gene. The weights are at the edges of the network. Gene x_{9} does not Granger cause gene y_{t}. (Figure 1: {{:figure1suppl.pdf|}}) | **Simulation 1 (Multivariate model):** This simulation illustrates a multivariate (standard) case where nine (predictor) genes Granger cause one (target) gene. The weights are at the edges of the network. Gene x_{9} does not Granger cause gene y_{t}. (Figure 1: {{:figure1suppl.pdf|}}) | ||
- | **Simulation 2 (Pathway-pathway model):** This simulation illustrates a pathway-pathway case where a set of genes Granger cause another set of genes. The details about this simulation is explained in the original paper in page 9 simulation 1 of {{:grangerforgroups.pdf|}}. (Figure 2: {{:figure2suppl.pdf|}}) | + | **Simulation 2 (Module-module model):** This simulation illustrates a module-module case where a set of genes Granger cause another set of genes. The details about this simulation is explained in the original paper simulation 1 of {{:grangerforgroups.pdf|}}. (Figure 2: {{:figure2suppl.pdf|}}) |
- | **Simulation 2 with different noises** | + | **Simulation 2 with different noise distribution** |
- | The following figures illustrate the p-value distributions for both, the bootstrap and the Likelihood Ratio Test procedures under the null hypothesis (set I to set III; set II to set I; set III to set and set III to set III), and the ROC curves for simulation 2 of the manuscript. Full lines represents the LRT while dashed line is the bootstrap test in the ROC curves. | + | The following figures illustrate the p-value distributions for both, the bootstrap and the Likelihood Ratio Test procedures under the null hypothesis (set I to set III; set II to set I; set III to set I and set III to set III), and the ROC curves for simulation 2. Full lines represent the LRT while dashed lines are the ROC curves for the bootstrap. The results show that both LRT and bootstrap procedures control effectively the rate of false positives (p-values histograms close to uniform distributions) |
- | - Simulations with Gaussian noises (N(0,1)). | + | - Simulations with Gaussian noise (N(0,1)). |
^Time series length|Bootstrap test|Likelihood ratio test|ROC curve^^ | ^Time series length|Bootstrap test|Likelihood ratio test|ROC curve^^ | ||
Line 64: | Line 64: | ||
^100|{{:pvaluebootfp-tstudentdf7.pdf|}}|{{:pvaluelikefp-tstudentdf7.pdf|}}|{{:roc-tstudentdf7.pdf|}}| | ^100|{{:pvaluebootfp-tstudentdf7.pdf|}}|{{:pvaluelikefp-tstudentdf7.pdf|}}|{{:roc-tstudentdf7.pdf|}}| | ||
+ | - Simulations with a multivariate t-Student noises (d.f.=3). | ||
+ | |||
+ | ^Time series length|Bootstrap test|Likelihood ratio test|ROC curve^^ | ||
+ | ^100|{{:pvaluebootfp-tstudentxxxdf3.pdf|}}|{{:pvaluelikefp-tstudentxxxdf3.pdf|}}|{{:roc-tstudentxxxdf3.pdf|}}| | ||
**Verifying the control of type I error in actual biological data** | **Verifying the control of type I error in actual biological data** | ||
- | In order to verify if LRT can control the rate of false positives even in actual biological data, we selected the same genes used in ({{:grangerforgroups.pdf|}}) and permuted the values of the time series, consequently, eliminating eventually existing Granger causality among them. Then, the Granger causality between sets based on LRT were carried out in order to identify Granger causality. In {{:hela-random.pdf|}} one can observe that the type I error is effectively controlled by LRT since all the p-values' histograms are close to uniform distributions (under the null hypothesis). This experiment was done 10,000 times. | + | In order to verify if LRT can control the rate of false positives in real biological data, we selected the same genes used in ({{:grangerforgroups.pdf|}}) and permuted the values of the time series, in order to eliminate any existing Granger causality among them. Then, LRT was carried out in order to identify Granger causality between gene sets. In {{:hela-random.pdf|}} one can observe that the type I error is effectively controlled by LRT since all the p-value histograms are close to uniform distributions (under the null hypothesis). This experiment was done 10,000 times. |
+ | |||
+ | **Application to biological data** | ||
+ | |||
+ | The following figure ({{:hela-network.pdf|}}) illustrates the application of LRT and Bartlett correction for the identification of Granger causality between sets of genes. The coefficients estimated by the method are at the edges. Solid lines represent statistically significant Granger causalities with p-value < 0.05 and dashed lines are p-value < 0.10. | ||
+ | |||
+ | One out of every two men and one out of every three women will develop cancer during their lifetime (American Cancer Society, 2008). Proto-oncogenes constitute a group of genes that cause normal cells to become cancerous when they are mutated (Weinstein & Joe, 2006). Proto-oncogenes encode proteins that function to stimulate cell division, inhibit cell differentiation or halt cell death. All of these processes are important for normal human development. Oncogenes, the mutated version of proto-oncogenes, typically lead to increased cell division, decreased cell differentiation, and inhibition of cell death; taken together, these phenotypes define cancer cells. | ||
+ | Three proto-oncogenes were chosen in a time series dataset (Whitfield et al., 2000): C-MYC, C-JUN and C-FOS. These genes have been shown to concomitantly participate in important processes such as tissue regeneration (Morello et al, 1990), stress response (Buhk et al., 1990) and carcinogenesis (Yuen et al, 2001). A more comprehensive knowledge on how these genes interact with each other and with other cellular systems should contribute to the understanding of such processes. As expected from published data, Figure ({{:hela-network.pdf|}}) shows that the expression of these genes is correlated in time. They seem to be also timely correlated with three groups of genes that play important roles in cancer: FGF1-FRAG1-FGF12B; TP53-P21-GADD45A and IL1B-IL6-IL8. | ||
+ | Fibroblast growth factor (FGF) family members possess broad mitogenic and cell survival activities. Three genes from this family came up in our analysis: fibroblast growth factor 1 (FGF1), which has potent biological activities implicated in cancer development, and FGF12B and FRAG1, which do not have any specific function determined and have been poorly studied so far (Lorenzi et al., 1996). Their insertion in the system presented here should help in the design of experiments aiming to better understand their contribution to this family. | ||
+ | |||
+ | FGF and p53 pathways may interact in the cell to determine cell fate (Bouleau et al, 2005). Deregulation of one of these pathways modifies the balance between cell proliferation and cell death and may lead to tumor progression. Our data correlated p53 to p21, a major transcriptional target for p53, and to GADD45A, a growth arrest and DNA-damage gene that acts in a p53 dependent manner (Ji et al., 2007). According to the network presented in this study, both sets of genes (FGF family members and p53-regulated genes) are associated with the three selected proto-oncogenes in a timely manner. Worth noting is that c-myc is a known inducer of wild type p53, decreased c-myc expression may lead to uncontrolled cell growth because of the lack of p53 expression that normally induces apoptosis | ||
+ | |||
+ | Finally, an association between interleukins IL6, IL8 and ILB and proto-oncogenes, often at the expression level, has been demonstrated for cellular processes, such as cell cycle regulation, angiogenesis and proliferation (Nabata et al., 1990; Resnitzky et al, 1991; Krishnamoorthy et al., 2000; Shchors et al., 2006). | ||
+ | Taken together, the results presented here show that the interaction between known pathways and well-studied cellular processes might work at levels not yet explored. Even though interactions are probably not direct, the network shows how sets of genes which have been independently studied in the context of cancer might collaborate for a given phenotype. | ||
+ | |||
+ | **References:** | ||
+ | |||
+ | American Cancer Society. Cancer Facts and Figures 2008. | ||
+ | |||
+ | Bouleau S., Grimal H., Rincheval V., Godefroy N., Mignotte B., Vayssière J-L., Renaud F. FGF1 inhibits p53-dependent apoptosis and cell cycle arrest via an intracrine pathway. Oncogene. 24:7839–7849, 2005. | ||
+ | |||
+ | Bukh A., Martinez-Valdez H., Freedman S.J., Freedman M.H. Cohen A. The expression of c-fos, c-jun, and c-myc genes is regulated by heat shock in human lymphoid cells. The Journal of Immunology. 144:4835-4840, 1990. | ||
+ | |||
+ | Ji J., Liu R., Tong T., Song Y., Jin S., Wu M., Zhan Q. Gadd45a regulates β-catenin distribution and maintains cell–cell adhesion/contactGadd45a induces β-catenin distribution. Oncogene. 26:6396-6405, 2007 | ||
+ | |||
+ | Krishnamoorthy B., Narayanan K., Miyamoto S., Balakrishnan A. Epithelial cells release proinflammatory cytokines and undergo c-Myc-induced apoptosis on exposure to filarial parasitic sheath protein-Bc12 mediates rescue by activating c-H-Ras. In Vitro Cellular & Developmental Biology. Animal. 36: 532-538, 2000. | ||
+ | |||
+ | Lorenzi M.V., Horii Y., Yamanaka R., Sakaguchi K., Miki T. FRAG1, a gene that potently activates fibroblast growth factor receptor by C-terminal fusion through chromosomal rearrangement. PNAS. 93: 8956-8961, 1996. | ||
+ | |||
+ | Morello D., Fitzgerald M.J., Babinet C., Fausto N. c-myc, c-fos, and c-jun regulation in the regenerating livers of normal and H-2K/c-myc transgenic mice. Mol Cell Biol. 10:3185-3193 , 1990. | ||
+ | |||
+ | Nabata T., Morimoto S., Koh E., Shiraishi T., Ogihara T. Interleukin-6 stimulates c-myc expression and proliferation of cultured vascular smooth muscle cells. Biochem Int. 20:445-53, 1990. | ||
+ | |||
+ | Resnitzky D. and Kimchi A. Deregulated c-myc expression abrogates the interferon- and interleukin 6-mediated G0/G1 cell cycle arrest but not other inhibitory responses in M1 myeloblastic cells. Cell Growth & Differentiation. 2: 33-41. | ||
+ | |||
+ | Shchors K., Shchors E., Rostker F., Lawlor E.R., Brown-Swigart L., Evan G.I. The Myc-dependent angiogenic switch in tumors is mediated by interleukin 1. Genes Dev. 20: 2527–2538, 2006. | ||
+ | |||
+ | Weinstein I.B. and Joe A.K. Mechanisms of disease: Oncogene addiction—a rationale for molecular targeting in cancer therapy. Nature Clinical Practice Oncology. 3:448–457, 2006. | ||
+ | |||
+ | Whitfield M.L., Sherlock G., Saldanha A.J., Murray J.I., Ball C.A., Alexander K.E., Matese J.C., Perou C.M., Hurt M.M., Brown P.O., Botstein D. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell. 13:1977-2000, 2000. | ||
+ | |||
+ | Yuen M.F., Wu P.C., Lai V.C., Lau J.Y., Lai C.L. Expression of c-Myc, c-Fos, and c-jun in hepatocellular carcinoma. Cancer. 91:106-12, 2001. |