Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs. / Tolstoganov, I.; Chen, Z.; Pevzner, P.; Korobeynikov, A.
в: PeerJ, Том 12, № 9, 27.09.2024.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs
AU - Tolstoganov, I.
AU - Chen, Z.
AU - Pevzner, P.
AU - Korobeynikov, A.
N1 - Export Date: 21 October 2024 Адрес для корреспонденции: Korobeynikov, A.; Department of Statistical Modelling, Russian Federation; эл. почта: anton@korobeynikov.info
PY - 2024/9/27
Y1 - 2024/9/27
N2 - Background: Recent advances in long-read sequencing technologies enabled accurate and contiguous de novo assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform them into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes. Results: We developed the SpLitteR tool for diploid genome assembly using linked-reads and assembly graphs and benchmarked it against state-of-the-art linked-read scaffolders ARKS and SLR-superscaffolder using human HG002 genome and sheep gut microbiome datasets. The benchmark showed that SpLitteR scaffolding results in 1.5-fold increase in NGA50 compared to the baseline LJA assembly and other scaffolders while introducing no additional misassemblies on the human dataset. Conclusion: We developed the SpLitteR tool for assembly graph phasing and scaffolding using barcoded linked-reads. We benchmarked SpLitteR on assembly graphs produced by various long-read assemblers and have demonstrated that TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics. SpLitteR is implemented in C++ as a part of the freely available SPAdes package and is available at https://github.com/ablab/spades/releases/tag/splitter-preprint. Copyright 2024 Tolstoganov et al.
AB - Background: Recent advances in long-read sequencing technologies enabled accurate and contiguous de novo assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform them into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes. Results: We developed the SpLitteR tool for diploid genome assembly using linked-reads and assembly graphs and benchmarked it against state-of-the-art linked-read scaffolders ARKS and SLR-superscaffolder using human HG002 genome and sheep gut microbiome datasets. The benchmark showed that SpLitteR scaffolding results in 1.5-fold increase in NGA50 compared to the baseline LJA assembly and other scaffolders while introducing no additional misassemblies on the human dataset. Conclusion: We developed the SpLitteR tool for assembly graph phasing and scaffolding using barcoded linked-reads. We benchmarked SpLitteR on assembly graphs produced by various long-read assemblers and have demonstrated that TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics. SpLitteR is implemented in C++ as a part of the freely available SPAdes package and is available at https://github.com/ablab/spades/releases/tag/splitter-preprint. Copyright 2024 Tolstoganov et al.
KW - Assembly graph
KW - Repeat resolution
KW - Tell-seq
KW - article
KW - benchmarking
KW - diploidy
KW - genome
KW - human
KW - metagenome
KW - microbiome
KW - nonhuman
KW - sheep
KW - telomere
KW - Sheep/genetics
KW - Humans
KW - Sequence Analysis, DNA/methods
KW - High-Throughput Nucleotide Sequencing/methods
KW - Diploidy
KW - Animals
KW - Genome/genetics
KW - Genome, Human/genetics
KW - Gastrointestinal Microbiome/genetics
KW - Software
U2 - 10.7717/peerj.18050
DO - 10.7717/peerj.18050
M3 - статья
C2 - 39351368
VL - 12
JO - PeerJ
JF - PeerJ
SN - 2167-8359
IS - 9
ER -
ID: 126221872