The problem of the longest common subsequence without repetitions
Abstract
We study the following problem: given two sequences X and Y over a finite alphabet, find a longest common subsequence of X and Y without repeated symbols. This NP-hard problem appears in the context of genome rearrangement when gene duplication is considered. We describe an algorithm that solves the problem optimally based on the branch-and-cut technique. The algorithm takes advantage of good heuristics and of a simple but effective technique that eliminates variables during the algorithm. Finally, we describe an algorithm based on dynamic programming that takes linear time and space when the size of the alphabet is constant.
References
Fernandes, C., Ferreira, C., Tjandraatmadja, C., and Wakabayashi, Y. (2008). A polyhedral investigation of the LCS problem and a repetition-free variant. In Proc. of the 8th Latin American Symposium on Theoretical Informatics (LATIN’08), volume 4957 of Lecture Notes in Computer Science, pages 329–338. Springer.
Ferreira, C. and Tjandraatmadja, C. (2010). A branch-and-cut approach to the repetitionfree longest common subsequence problem. Electronic Notes in Discrete Mathematics, 36:527–534. ISCO 2010 - International Symposium on Combinatorial Optimization.
Hunt, J. and Szymanski, T. (1977). A fast algorithm for computing longest common subsequences. Communications of the ACM, 20(5):350–353.
Sankoff, D. and Kruskal, J. (1983). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley.
Tjandraatmadja, C. (2010). O problema da subsequência comum máxima. Dissertação de mestrado.
