Standard

Minimum common string partition : Exact algorithms. / Cygan, Marek; Kulikov, Alexander S.; Mihajlin, Ivan; Nikolaev, Maksim; Reznikov, Grigory.

29th Annual European Symposium on Algorithms, ESA 2021. ed. / Petra Mutzel; Rasmus Pagh; Grzegorz Herman. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2021. 35 (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 204).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Cygan, M, Kulikov, AS, Mihajlin, I, Nikolaev, M & Reznikov, G 2021, Minimum common string partition: Exact algorithms. in P Mutzel, R Pagh & G Herman (eds), 29th Annual European Symposium on Algorithms, ESA 2021., 35, Leibniz International Proceedings in Informatics, LIPIcs, vol. 204, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 29th Annual European Symposium on Algorithms, ESA 2021, Vitual, Lisbon, Portugal, 6/09/21. https://doi.org/10.4230/LIPIcs.ESA.2021.35

APA

Cygan, M., Kulikov, A. S., Mihajlin, I., Nikolaev, M., & Reznikov, G. (2021). Minimum common string partition: Exact algorithms. In P. Mutzel, R. Pagh, & G. Herman (Eds.), 29th Annual European Symposium on Algorithms, ESA 2021 [35] (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 204). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.ESA.2021.35

Vancouver

Cygan M, Kulikov AS, Mihajlin I, Nikolaev M, Reznikov G. Minimum common string partition: Exact algorithms. In Mutzel P, Pagh R, Herman G, editors, 29th Annual European Symposium on Algorithms, ESA 2021. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2021. 35. (Leibniz International Proceedings in Informatics, LIPIcs). https://doi.org/10.4230/LIPIcs.ESA.2021.35

Author

Cygan, Marek ; Kulikov, Alexander S. ; Mihajlin, Ivan ; Nikolaev, Maksim ; Reznikov, Grigory. / Minimum common string partition : Exact algorithms. 29th Annual European Symposium on Algorithms, ESA 2021. editor / Petra Mutzel ; Rasmus Pagh ; Grzegorz Herman. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2021. (Leibniz International Proceedings in Informatics, LIPIcs).

BibTeX

@inproceedings{83469937f7c04fd3993f2eb3e555dfd9,
title = "Minimum common string partition: Exact algorithms",
abstract = "In the minimum common string partition problem (MCSP), one gets two strings and is asked to find the minimum number of cuts in the first string such that the second string can be obtained by rearranging the resulting pieces. It is a difficult algorithmic problem having applications in computational biology, text processing, and data compression. MCSP has been studied extensively from various algorithmic angles: there are many papers studying approximation, heuristic, and parameterized algorithms. At the same time, almost nothing is known about its exact complexity. In this paper, we present new results in this direction. We improve the known 2n upper bound (where n is the length of input strings) to ϕn where ϕ ≈ 1.618... is the golden ratio. The algorithm uses Fibonacci numbers to encode subsets as monomials of a certain implicit polynomial and extracts one of its coefficients using the fast Fourier transform. Then, we show that the case of constant size alphabet can be solved in subexponential time 2O(n log log n/log n) by a hybrid strategy: enumerate all long pieces and use dynamic programming over histograms of short pieces. Finally, we prove almost matching lower bounds assuming the Exponential Time Hypothesis.",
keywords = "Exact algorithms, Lower bounds, Similarity measure, String distance, Upper bounds",
author = "Marek Cygan and Kulikov, {Alexander S.} and Ivan Mihajlin and Maksim Nikolaev and Grigory Reznikov",
note = "Funding Information: Funding Marek Cygan: The work of Marek Cygan is part of a project TOTAL that has received funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation programme (grant agreement No 677651). Alexander S. Kulikov: The research presented in Section 6 is supported by Russian Science Foundation (18-71-10042). Maksim Nikolaev: The research presented in Section 6 is supported by Russian Science Foundation (18-71-10042). The research presented in Section 3 is supported by Ministry of Science and Education grant 075-15-2019-1620. Publisher Copyright: {\textcopyright} Marek Cygan, Alexander S. Kulikov, Ivan Mihajlin, Maksim Nikolaev, and Grigory Reznikov; licensed under Creative Commons License CC-BY 4.0; 29th Annual European Symposium on Algorithms, ESA 2021 ; Conference date: 06-09-2021 Through 08-09-2021",
year = "2021",
month = sep,
day = "1",
doi = "10.4230/LIPIcs.ESA.2021.35",
language = "English",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Petra Mutzel and Rasmus Pagh and Grzegorz Herman",
booktitle = "29th Annual European Symposium on Algorithms, ESA 2021",
address = "Germany",

}

RIS

TY - GEN

T1 - Minimum common string partition

T2 - 29th Annual European Symposium on Algorithms, ESA 2021

AU - Cygan, Marek

AU - Kulikov, Alexander S.

AU - Mihajlin, Ivan

AU - Nikolaev, Maksim

AU - Reznikov, Grigory

N1 - Funding Information: Funding Marek Cygan: The work of Marek Cygan is part of a project TOTAL that has received funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation programme (grant agreement No 677651). Alexander S. Kulikov: The research presented in Section 6 is supported by Russian Science Foundation (18-71-10042). Maksim Nikolaev: The research presented in Section 6 is supported by Russian Science Foundation (18-71-10042). The research presented in Section 3 is supported by Ministry of Science and Education grant 075-15-2019-1620. Publisher Copyright: © Marek Cygan, Alexander S. Kulikov, Ivan Mihajlin, Maksim Nikolaev, and Grigory Reznikov; licensed under Creative Commons License CC-BY 4.0

PY - 2021/9/1

Y1 - 2021/9/1

N2 - In the minimum common string partition problem (MCSP), one gets two strings and is asked to find the minimum number of cuts in the first string such that the second string can be obtained by rearranging the resulting pieces. It is a difficult algorithmic problem having applications in computational biology, text processing, and data compression. MCSP has been studied extensively from various algorithmic angles: there are many papers studying approximation, heuristic, and parameterized algorithms. At the same time, almost nothing is known about its exact complexity. In this paper, we present new results in this direction. We improve the known 2n upper bound (where n is the length of input strings) to ϕn where ϕ ≈ 1.618... is the golden ratio. The algorithm uses Fibonacci numbers to encode subsets as monomials of a certain implicit polynomial and extracts one of its coefficients using the fast Fourier transform. Then, we show that the case of constant size alphabet can be solved in subexponential time 2O(n log log n/log n) by a hybrid strategy: enumerate all long pieces and use dynamic programming over histograms of short pieces. Finally, we prove almost matching lower bounds assuming the Exponential Time Hypothesis.

AB - In the minimum common string partition problem (MCSP), one gets two strings and is asked to find the minimum number of cuts in the first string such that the second string can be obtained by rearranging the resulting pieces. It is a difficult algorithmic problem having applications in computational biology, text processing, and data compression. MCSP has been studied extensively from various algorithmic angles: there are many papers studying approximation, heuristic, and parameterized algorithms. At the same time, almost nothing is known about its exact complexity. In this paper, we present new results in this direction. We improve the known 2n upper bound (where n is the length of input strings) to ϕn where ϕ ≈ 1.618... is the golden ratio. The algorithm uses Fibonacci numbers to encode subsets as monomials of a certain implicit polynomial and extracts one of its coefficients using the fast Fourier transform. Then, we show that the case of constant size alphabet can be solved in subexponential time 2O(n log log n/log n) by a hybrid strategy: enumerate all long pieces and use dynamic programming over histograms of short pieces. Finally, we prove almost matching lower bounds assuming the Exponential Time Hypothesis.

KW - Exact algorithms

KW - Lower bounds

KW - Similarity measure

KW - String distance

KW - Upper bounds

UR - http://www.scopus.com/inward/record.url?scp=85115061979&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ESA.2021.35

DO - 10.4230/LIPIcs.ESA.2021.35

M3 - Conference contribution

AN - SCOPUS:85115061979

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 29th Annual European Symposium on Algorithms, ESA 2021

A2 - Mutzel, Petra

A2 - Pagh, Rasmus

A2 - Herman, Grzegorz

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

Y2 - 6 September 2021 through 8 September 2021

ER -

ID: 97553114