The problem of the longest common subsequence without repetitions

  • Carlos E. Ferreira USP
  • Christian Tjandraatmadja USP

Abstract


We study the following problem: given two sequences X and Y over a finite alphabet, find a longest common subsequence of X and Y without repeated symbols. This NP-hard problem appears in the context of genome rearrangement when gene duplication is considered. We describe an algorithm that solves the problem optimally based on the branch-and-cut technique. The algorithm takes advantage of good heuristics and of a simple but effective technique that eliminates variables during the algorithm. Finally, we describe an algorithm based on dynamic programming that takes linear time and space when the size of the alphabet is constant.

Keywords: Repetition-free longest common subsequence, longest common subsequence, integer programming, branch-and-cut, heuristics

References

Adi, S., Braga, M., Fernandes, C., Ferreira, C., Martinez, F., Sagot, M.-F., Stefanes, M., Tjandraatmadja, C., and Wakabayashi, Y. (2010). Repetition-free longest common subsequence. Discrete Applied Mathematics, 158(12):1315–1324. Traces from LAGOS’07 IV Latin American Algorithms, Graphs, and Optimization Symposium Puerto Varas - 2007.

Fernandes, C., Ferreira, C., Tjandraatmadja, C., and Wakabayashi, Y. (2008). A polyhedral investigation of the LCS problem and a repetition-free variant. In Proc. of the 8th Latin American Symposium on Theoretical Informatics (LATIN’08), volume 4957 of Lecture Notes in Computer Science, pages 329–338. Springer.

Ferreira, C. and Tjandraatmadja, C. (2010). A branch-and-cut approach to the repetitionfree longest common subsequence problem. Electronic Notes in Discrete Mathematics, 36:527–534. ISCO 2010 - International Symposium on Combinatorial Optimization.

Hunt, J. and Szymanski, T. (1977). A fast algorithm for computing longest common subsequences. Communications of the ACM, 20(5):350–353.

Sankoff, D. and Kruskal, J. (1983). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley.

Tjandraatmadja, C. (2010). O problema da subsequência comum máxima. Dissertação de mestrado.
Published
2011-07-19
FERREIRA, Carlos E.; TJANDRAATMADJA, Christian. The problem of the longest common subsequence without repetitions. In: THESIS AND DISSERTATION CONTEST (CTD), 24. , 2011, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2011 . p. 28-33. ISSN 2763-8820.