Heuristics for Cycle Packing of Adjacency Graphs for Genomes with Repeated Genes
Resumo
The Adjacency Graph is a structure used to model genomes in several rearrangement distance problems. In particular, most studies use properties of a maximum cycle packing of this graph to develop bounds and algorithms for rearrangement distance problems, such as the reversal distance and the Double Cut and Join (DCJ) distance. When each genome has no repeated genes, there exists only one cycle packing for the graph. However, when each genome may have repeated genes, the problem of finding a maximum cycle packing for the adjacency graph (Adjacency Graph Packing) is NP-hard. In this work, we developed a greedy random heuristic and a genetic algorithm heuristic for the Adjacency Graph Packing problem for genomes with repeated genes. We present experimental results and compare these heuristics with the SOAR framework. Furthermore, we show how the solutions from our algorithms can improve the estimation for the reversal distance when compared to the SOAR framework. Lastly, we applied our genetic algorithm heuristic in real genomic data to validate its practical use.
Referências
Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). https://doi.org/10.1007/11851561_16
Chen, X., et al.: Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans. Comput. Biol. Bioinf. 2(4), 302–315 (2005)
Christie, D.A.: Genome Rearrangement Problems. Ph.D. thesis, Department of Computing Science, University of Glasgow (1998)
De Vienne, D.M., Giraud, T., Martin, O.C.: A congruence index for testing topological similarity between trees. Bioinformatics 23(23), 3119–3124 (2007)
Garczarek, L., et al.: Cyanorak v2. 1: a scalable information system dedicated to the visualization and expert curation of marine and brackish picocyanobacteria genomes. Nucleic Acids Res. 1 (2020)
Hannenhalli, S., Pevzner, P.A.: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM 46(1), 1–27 (1999)
Kahn, C., Raphael, B.: Analysis of segmental duplications via duplication distance. Bioinformatics 24(16), i133–i138 (2008)
Makarenkov, V., Leclerc, B.: Circular orders of tree metrics, and their uses for the reconstruction and fitting of phylogenetic trees. In: DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 183–208. American Mathematical Society (1997)
Mitchell, M.: Introduction to Genetic Algorithms. Springer, Berlin Heidelberg, Cambridge, MA, USA (2008)
Pinheiro, P.O., Alexandrino, A.O., Oliveira, A.R., de Souza, C.C., Dias, Z.: Heuristics for breakpoint graph decomposition with applications in genome rearrangement problems. In: BSB 2020. LNCS, vol. 12558, pp. 129–140. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65775-8_12
Radcliffe, A.J., Scott, A.D., Wilmer, E.L.: Reversals and transpositions over finite alphabets. SIAM J. Discrete Math. 19(1), 224–244 (2005)
Shao, M., Lin, Y., Moret, B.M.: An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)
Wang, L.G., et al.: Treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Mol. Biol. Evol. 37(2), 599–603 (2020)
Willing, E., Stoye, J., Braga, M.D.: Computing the inversion-indel distance. IEEE/ACM Trans. Comput. Biol. Bioinf. (2020)