Engineering Augmented Suffix Sorting Algorithms
Abstract
Strings are prevalent in Computer Science and algorithms for their efficient processing are fundamental in various applications. The results introduced in this work contribute with theoretical improvements and practical advances in building full-text indexes. Our first contribution is an in-place algorithm that computes the Burrows-Wheeler transform and the longest common prefix (LCP) array. Our second contribution is the construction of the suffix array augmented with the LCP array in optimal time and space for strings from constant size alphabets. Our third contribution is a set of algorithms to construct full-text indexes for string collections in optimal theoretical bounds. This work is an extended abstract of the Ph.D. thesis of the first author.
References
Bingmann, T., Fischer, J., and Osipov, V. (2016). Inducing suffix and LCP arrays in external memory. ACM J. Experiment. Algorithmics, 21(2):2.3:1–2.3:27.
Burrows, M. and Wheeler, D. J. (1994). A block-sorting loss-less data compression algorithm. Technical report, Digital SRC Research Report.
Crochemore, M., Grossi, R., Kärkkäinen, J., and Landau, G. M. (2015). Computing the Burrows-Wheeler transform in place and in small space. J. Discret. Algorithms, 32:44–52.
Fischer, J. (2010). Wee LCP. Inf. Process. Lett., 110(8-9):317–320.
Fischer, J. (2011). Inducing the LCP-Array. In Proc. Workshop on Algorithms and Data Structures (WADS), pages 374–385.
Gog, S. and Ohlebusch, E. (2013). Compressed suffix trees: Efficient computation and storage of LCP-values. ACM J. Experiment. Algorithmics.
Gusfield, D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York, NY, USA.
Kärkkäinen, J. (2016). Suffix array construction. In Encyclopedia of Algorithms, pages 2141–2144. Springer.
Kärkkäinen, J., Kempa, D., Puglisi, S. J., and Zhukova, B. (2017). Engineering external memory induced suffix sorting. In Proc. Workshop on Algorithm Engineering and Experimentation (ALENEX), pages 98–108.
Kärkkäinen, J., Manzini, G., and Puglisi, S. J. (2009). Permuted longest-common-prefix array. In Proc. An. Symp. on Combinatorial Pattern Matching (CPM), pages 181–192.
Kasai, T., Lee, G., Arimura, H., Arikawa, S., and Park, K. (2001). Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. An. Symp. on Combinatorial Pattern Matching (CPM), pages 181–192.
Louza, F. A. (2017). Engineering augmented suffix sorting algorithms. PhD thesis, University of Campinas, Brazil.
Louza, F. A., Gagie, T., and Telles, G. P. (2017a). Burrows-Wheeler transform and LCP array construction in constant space. J. Discret. Algorithms, 42:14–22.
Louza, F. A., Gog, S., and Telles, G. P. (2016). Induced suffix sorting for string collections. In Proc. IEEE Data Compression Conference (DCC), pages 43–52.
Louza, F. A., Gog, S., and Telles, G. P. (2017b). Inducing enhanced suffix arrays for string collections. Theor. Comput. Sci., 678:22–39.
Louza, F. A., Gog, S., and Telles, G. P. (2017c). Optimal suffix sorting and LCP array construction for constant alphabets. Inf. Process. Lett., 118:30–34.
Louza, F. A. and Telles, G. P. (2015). Computing the BWT and the LCP array in constant space. In Proc. International Workshop on Combinatorial Algorithms (IWOCA), pages 312–320.
Manber, U. and Myers, E. W. (1993). Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935–948.
Manzini, G. (2004). Two space saving tricks for linear time LCP array computation. In Proc. Scandinavian Workshop on Algorithm Theory (SWAT), pages 372–383.
Navarro, G. (2016). Compact Data Structures – A practical approach. Cambridge University Press.
Nong, G. (2013). Practical linear-time O(1)-workspace suffix sorting for constant alphabets. ACM Trans. Inform. Syst., 31(3):1–15.
Nong, G., Zhang, S., and Chan, W. H. (2011). Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput., 60(10):1471–1484.
Ohlebusch, E. (2013). Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction. Oldenbusch Verlag.
Okanohara, D. and Sadakane, K. (2009). A linear-time Burrows-Wheeler transform using induced sorting. In Proc. International Symposium on String Processing and Information Retrieval (SPIRE), pages 90–101.
Puglisi, S. J., Smyth, W. F., and Turpin, A. H. (2007). A taxonomy of suffix array construction algorithms. ACM Comp. Surv., 39(2):1–31.
Sadakane, K. (2002). Succinct representations of LCP information and improvements in the compressed suffix arrays. In Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 225–232.
