ggranger [Andre Fujita

Trace: • ggranger

This is an old revision of the document!

A fast and robust statistical test based on Likelihood ratio with Bartlett correction to identify Granger causality between gene sets

Andre Fujita, Kaname Kojima, Alexandre G. Patriota, Joao R. Sato, Patricia Severino, Satoru Miyano

Summary: We propose a likelihood ratio test (LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrap-based approach. LRT is shown to be significantly faster and statistically powerful even within non-Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests, is also provided.

Contact: andrefujita AT riken DOT jp

Installation: In order to install gGranger, download the appropriate file below and type the following command at your terminal console: R CMD INSTALL <file name>

R packages
for Windows	ggranger_win_1.0.0.tar.gz
for Linux	ggranger_linux_1.0.0.tar.gz

Original paper (Identification of Granger causality between gene sets)

Paper (preprint): grangerforgroups.pdf DOI:10.1142/S0219720010004860

Supplementary files

Simulation 1 (Multivariate model): This simulation illustrates a multivariate (standard) case where nine (predictor) genes Granger cause one (target) gene. The weights are at the edges of the network. Gene x_{9} does not Granger cause gene y_{t}. (Figure 1: figure1suppl.pdf)

Simulation 2 (Module-module model): This simulation illustrates a module-module case where a set of genes Granger cause another set of genes. The details about this simulation is explained in the original paper in page 9 simulation 1 of grangerforgroups.pdf. (Figure 2: figure2suppl.pdf)

Simulation 2 with different noises

The following figures illustrate the p-value distributions for both, the bootstrap and the Likelihood Ratio Test procedures under the null hypothesis (set I to set III; set II to set I; set III to set and set III to set III), and the ROC curves for simulation 2 of the manuscript. Full lines represents the LRT while dashed line is the bootstrap test in the ROC curves. The results show that both LRT and bootstrap procedures control effectively the rate of false positives (p-values histograms close to uniform distributions)

- Simulations with Gaussian noises (N(0,1)).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
75	pvaluebootfp-75.pdf	pvaluelikefp-75.pdf	roc-75.pdf
100	pvaluebootfp-100.pdf	pvaluelikefp-100.pdf	roc-100.pdf

- Simulations with Uniform noises (U(-0.5,0.5)).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
200	pvaluebootfp-uniform.pdf	pvaluelikefp-uniform.pdf	roc-uniform.pdf

- Simulations with Exponential noises (with mean Exp(1)-1).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
75	pvaluebootfp-75exp.pdf	pvaluelikefp-75exp.pdf	roc-75exp.pdf

- Simulations with Gamma noises (Gamma(1,1)-1).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
100	pvaluebootfp-gamma.pdf	pvaluelikefp-gamma.pdf	roc-gamma.pdf

- Simulations with half-normal noises (abs(N(0,1))-sqrt(1/Pi)).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
100	pvaluebootfp-halfnormal.pdf	pvaluelikefp-halfnormal.pdf	roc-halfnormal.pdf

- Simulations with t-Student noises (d.f.=3).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
100	pvaluebootfp-tstudentdf3.pdf	pvaluelikefp-tstudentdf3.pdf	roc-tstudentdf3.pdf

- Simulations with t-Student noises (d.f.=7).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
100	pvaluebootfp-tstudentdf7.pdf	pvaluelikefp-tstudentdf7.pdf	roc-tstudentdf7.pdf

- Simulations with a multivariate t-Student noises (d.f.=3).

Time series length	Bootstrap test	Likelihood ratio test	ROC curve
100	pvaluebootfp-tstudentxxxdf3.pdf	pvaluelikefp-tstudentxxxdf3.pdf	roc-tstudentxxxdf3.pdf

Verifying the control of type I error in actual biological data

In order to verify if LRT can control the rate of false positives even in actual biological data, we selected the same genes used in (grangerforgroups.pdf) and permuted the values of the time series, consequently, eliminating eventually existing Granger causality among them. Then, the Granger causality between sets based on LRT were carried out in order to identify Granger causality. In hela-random.pdf one can observe that the type I error is effectively controlled by LRT since all the p-values' histograms are close to uniform distributions (under the null hypothesis). This experiment was done 10,000 times.

Application to actual biological data

The following figure (hela-network.pdf) illustrates the application of Granger causality for sets of genes with LRT and Bartlett correction. The coefficients estimated by the method are at the edges. Solid lines represent statistically significant Granger causalities with p-value < 0.05 and dashed lines are p-value < 0.10.

ggranger.1276737168.txt.gz · Last modified: 2010/06/17 10:12 by mlabadm

Show pagesource Old revisions