Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Minimum common string partition : Exact algorithms. / Cygan, Marek; Kulikov, Alexander S.; Mihajlin, Ivan; Nikolaev, Maksim; Reznikov, Grigory.
29th Annual European Symposium on Algorithms, ESA 2021. ред. / Petra Mutzel; Rasmus Pagh; Grzegorz Herman. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2021. 35 (Leibniz International Proceedings in Informatics, LIPIcs; Том 204).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Minimum common string partition
T2 - 29th Annual European Symposium on Algorithms, ESA 2021
AU - Cygan, Marek
AU - Kulikov, Alexander S.
AU - Mihajlin, Ivan
AU - Nikolaev, Maksim
AU - Reznikov, Grigory
N1 - Funding Information: Funding Marek Cygan: The work of Marek Cygan is part of a project TOTAL that has received funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation programme (grant agreement No 677651). Alexander S. Kulikov: The research presented in Section 6 is supported by Russian Science Foundation (18-71-10042). Maksim Nikolaev: The research presented in Section 6 is supported by Russian Science Foundation (18-71-10042). The research presented in Section 3 is supported by Ministry of Science and Education grant 075-15-2019-1620. Publisher Copyright: © Marek Cygan, Alexander S. Kulikov, Ivan Mihajlin, Maksim Nikolaev, and Grigory Reznikov; licensed under Creative Commons License CC-BY 4.0
PY - 2021/9/1
Y1 - 2021/9/1
N2 - In the minimum common string partition problem (MCSP), one gets two strings and is asked to find the minimum number of cuts in the first string such that the second string can be obtained by rearranging the resulting pieces. It is a difficult algorithmic problem having applications in computational biology, text processing, and data compression. MCSP has been studied extensively from various algorithmic angles: there are many papers studying approximation, heuristic, and parameterized algorithms. At the same time, almost nothing is known about its exact complexity. In this paper, we present new results in this direction. We improve the known 2n upper bound (where n is the length of input strings) to ϕn where ϕ ≈ 1.618... is the golden ratio. The algorithm uses Fibonacci numbers to encode subsets as monomials of a certain implicit polynomial and extracts one of its coefficients using the fast Fourier transform. Then, we show that the case of constant size alphabet can be solved in subexponential time 2O(n log log n/log n) by a hybrid strategy: enumerate all long pieces and use dynamic programming over histograms of short pieces. Finally, we prove almost matching lower bounds assuming the Exponential Time Hypothesis.
AB - In the minimum common string partition problem (MCSP), one gets two strings and is asked to find the minimum number of cuts in the first string such that the second string can be obtained by rearranging the resulting pieces. It is a difficult algorithmic problem having applications in computational biology, text processing, and data compression. MCSP has been studied extensively from various algorithmic angles: there are many papers studying approximation, heuristic, and parameterized algorithms. At the same time, almost nothing is known about its exact complexity. In this paper, we present new results in this direction. We improve the known 2n upper bound (where n is the length of input strings) to ϕn where ϕ ≈ 1.618... is the golden ratio. The algorithm uses Fibonacci numbers to encode subsets as monomials of a certain implicit polynomial and extracts one of its coefficients using the fast Fourier transform. Then, we show that the case of constant size alphabet can be solved in subexponential time 2O(n log log n/log n) by a hybrid strategy: enumerate all long pieces and use dynamic programming over histograms of short pieces. Finally, we prove almost matching lower bounds assuming the Exponential Time Hypothesis.
KW - Exact algorithms
KW - Lower bounds
KW - Similarity measure
KW - String distance
KW - Upper bounds
UR - http://www.scopus.com/inward/record.url?scp=85115061979&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ESA.2021.35
DO - 10.4230/LIPIcs.ESA.2021.35
M3 - Conference contribution
AN - SCOPUS:85115061979
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 29th Annual European Symposium on Algorithms, ESA 2021
A2 - Mutzel, Petra
A2 - Pagh, Rasmus
A2 - Herman, Grzegorz
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Y2 - 6 September 2021 through 8 September 2021
ER -
ID: 97553114