A parallel approximation hitting set algorithm for gene expression analysis

D. P. Ruchkys; S. W. Song

D. P. Ruchkys USP
S. W. Song USP

Resumo

With the recent DNA-microarray technology, it is possible to measure the expression levels of thousands of genes simultaneously in the same experiment. A genetic network is a model that describes how the expression level of each gene is affected by the expression levels of other genes in the network. Given the results of an experiment with n genes and m measures over time (m << n), we consider the problem of finding a subset of genes (k genes, where k << n) that explain the expression level of a given target gene under study. We consider the coarse-grained multicomputer (CGM) model, with p processors. In this paper we first present a sequential approximation algorithm of O(m/sup 4/n) time and O(m/sup 2/n) space. The main result is a new parallel approximation algorithm that determines the k genes in O(m/sup 4/n/p) local computing time plus O(k) communication rounds, and with space requirement of O(m/sup 2/n/p). The p factor in the parallel time and space complexities indicates a good parallelization. We also show preliminary promising experimental results on a Beowulf machine. To our knowledge there are no CGM algorithms for the problem considered in this paper.

Palavras-chave: Approximation algorithms, Gene expression, Algorithm design and analysis, Genetics, User-generated content, Time measurement, Space technology, DNA computing, Monitoring, Computer networks