The laboratory of DNA Information Analysis (Laboratory Head: Prof. Satoru Miyano) at Human Genome Center, Institute of Medical Science, University of Tokyo, today announced that they have succeeded in computing the world's largest optimal Bayesian network structure search (node size = 30) using the HGC supercomputer system.
A Bayesian network is a graphical model which is commonly used to infer and model transcriptome level dependencies between genes from gene expression data (*1). However, the calculation of a Bayesian network whose structure is optimal to the observed data is a computationally difficult problem in terms of the number of genes in the network. The world record of the number of nodes in the optimal Bayesian network structure search reported until now was 29 (*2) with the discrete model, which can be computed very fast for the network scores. The bottleneck of the optimal Bayesian network structure search is memory space as well as CPU speed. The previous record was calculated using approx. 100GB HDD and 8 CPU cores (4 computational nodes with dual core Intel 3GHz Xeon).
Miyano Laboratory invented the new algorithm for the optimal Bayesian network structure search, which is able to run efficiently on distributed-memory supercomputers. The algorithm is based on the dynamic programming algorithm which was invented first in the world by Miyano Laboratory (*3). The newly implemented program was executed on 256 CPU cores (32 computational nodes with dual-quad core Intel 3GHz Xeon) of Human Genome Center's supercomputer for the simulated numerical data of 30 nodes and 50 samples, and it successfully finished its world record computation after 86 hours with approx. 255.5 GB (approx 1 GB per 1 CPU core). The continuous nonparametric regression model was used for the Bayesian network, which is approx. 100 times slower than the discrete models. This research result is currently in preparation to submit.
(*1) Tamada et al. (2009). Unraveling dynamic activities of autoacine pathways that control drug-response transcriptome networks. Pacific Symposium on Biocomputing (PSB2009) 14, 251-263.
(*2) Silander and Myllymäki (2006). A Simple Approach for Finding the Globally Optimal Bayesian Network Structure. In Proc. 22nd Conference on Uncertainty in Artificial Intelligence (UAI 2006), 445-452.
(*3) Ott et al. (2004). Finding optimal models for small gene networks. Pacific Symposium on Biocomputing (PSB2004) 9, 557–567.