Research output: Contribution to journal › Article › peer-review
SPAligner : Alignment of long diverged molecular sequences to assembly graphs. / Dvorkina, Tatiana; Antipov, Dmitry; Korobeynikov, Anton; Nurk, Sergey.
In: BMC Bioinformatics, Vol. 21, No. Suppl 12, 306, 24.07.2020.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - SPAligner
T2 - 3rd International Conference on Bioinformatics - From Algorithms to Applications (BiATA)
AU - Dvorkina, Tatiana
AU - Antipov, Dmitry
AU - Korobeynikov, Anton
AU - Nurk, Sergey
N1 - Funding Information: Publication of this supplement is funded by the Russian Science Foundation (grant 19-14-00172). Research was carried out in part by computational resources provided by Resource Center “Computer Center of SPbU”. The authors are grateful to Saint Petersburg State University for the overall support of this work (project id: 51555639). Funders had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. Publisher Copyright: © 2020 The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/7/24
Y1 - 2020/7/24
N2 - Background: Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data. Results: In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets. Conclusions: Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.
AB - Background: Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data. Results: In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets. Conclusions: Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.
KW - Assembly graph
KW - Graph alignment
KW - Molecular sequences alignment
KW - Genetic Variation
KW - Sequence Alignment
KW - Algorithms
KW - Base Sequence
KW - Humans
KW - Software
KW - Statistics as Topic
KW - Haplotypes/genetics
KW - beta-Lactamases/chemistry
UR - http://www.scopus.com/inward/record.url?scp=85088520108&partnerID=8YFLogxK
U2 - 10.1186/s12859-020-03590-7
DO - 10.1186/s12859-020-03590-7
M3 - Article
C2 - 32703258
VL - 21
JO - BMC Bioinformatics
JF - BMC Bioinformatics
SN - 1471-2105
IS - Suppl 12
M1 - 306
Y2 - 20 June 2019 through 22 June 2019
ER -
ID: 49272157