This is an old revision of the document!


A fast and robust statistical test based on Likelihood ratio with Bartlett correction to identify Granger causality between gene sets

Andre Fujita, Kaname Kojima, Alexandre G. Patriota, Joao R. Sato, Patricia Severino, Satoru Miyano

Summary: We propose a likelihood ratio test (LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrap-based approach. LRT is shown to be significantly faster and statistically powerful even within non-Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests, is also provided.

Contact: andrefujita AT riken DOT jp

Installation: In order to install gGranger, download the appropriate file below and type the following command at your terminal console: R CMD INSTALL <file name>

Original paper (Identification of Granger causality between gene sets)

Paper (preprint): grangerforgroups.pdf DOI:10.1142/S0219720010004860

Supplementary files

Simulation 1 (Multivariate model): This simulation illustrates a multivariate (standard) case where nine (predictor) genes Granger cause one (target) gene. The weights are at the edges of the network. Gene x_{9} does not Granger cause gene y_{t}. (Figure 1: figure1suppl.pdf)

Simulation 2 (Module-module model): This simulation illustrates a module-module case where a set of genes Granger cause another set of genes. The details about this simulation is explained in the original paper simulation 1 of grangerforgroups.pdf. (Figure 2: figure2suppl.pdf)

Simulation 2 with different noise distribution

The following figures illustrate the p-value distributions for both, the bootstrap and the Likelihood Ratio Test procedures under the null hypothesis (set I to set III; set II to set I; set III to set I and set III to set III), and the ROC curves for simulation 2. Full lines represent the LRT while dashed lines are the ROC curves for the bootstrap. The results show that both LRT and bootstrap procedures control effectively the rate of false positives (p-values histograms close to uniform distributions)

- Simulations with Gaussian noise (N(0,1)).

Time series lengthBootstrap testLikelihood ratio testROC curve
75pvaluebootfp-75.pdfpvaluelikefp-75.pdfroc-75.pdf
100pvaluebootfp-100.pdfpvaluelikefp-100.pdfroc-100.pdf

- Simulations with Uniform noises (U(-0.5,0.5)).

Time series lengthBootstrap testLikelihood ratio testROC curve
200pvaluebootfp-uniform.pdfpvaluelikefp-uniform.pdfroc-uniform.pdf

- Simulations with Exponential noises (with mean Exp(1)-1).

Time series lengthBootstrap testLikelihood ratio testROC curve
75pvaluebootfp-75exp.pdfpvaluelikefp-75exp.pdfroc-75exp.pdf

- Simulations with Gamma noises (Gamma(1,1)-1).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-gamma.pdfpvaluelikefp-gamma.pdfroc-gamma.pdf

- Simulations with half-normal noises (abs(N(0,1))-sqrt(1/Pi)).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-halfnormal.pdfpvaluelikefp-halfnormal.pdfroc-halfnormal.pdf

- Simulations with t-Student noises (d.f.=3).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-tstudentdf3.pdfpvaluelikefp-tstudentdf3.pdfroc-tstudentdf3.pdf

- Simulations with t-Student noises (d.f.=7).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-tstudentdf7.pdfpvaluelikefp-tstudentdf7.pdfroc-tstudentdf7.pdf

- Simulations with a multivariate t-Student noises (d.f.=3).

Time series lengthBootstrap testLikelihood ratio testROC curve
100pvaluebootfp-tstudentxxxdf3.pdfpvaluelikefp-tstudentxxxdf3.pdfroc-tstudentxxxdf3.pdf

Verifying the control of type I error in actual biological data

In order to verify if LRT can control the rate of false positives even in actual biological data, we selected the same genes used in (grangerforgroups.pdf) and permuted the values of the time series, consequently, eliminating eventually existing Granger causality among them. Then, the Granger causality between sets based on LRT were carried out in order to identify Granger causality. In hela-random.pdf one can observe that the type I error is effectively controlled by LRT since all the p-values' histograms are close to uniform distributions (under the null hypothesis). This experiment was done 10,000 times.

Application to actual biological data

The following figure (hela-network.pdf) illustrates the application of Granger causality for sets of genes with LRT and Bartlett correction. The coefficients estimated by the method are at the edges. Solid lines represent statistically significant Granger causalities with p-value < 0.05 and dashed lines are p-value < 0.10.

ggranger.1276890351.txt.gz · Last modified: 2010/06/19 04:45 by mlabadm
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0