Standard

SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs. / Tolstoganov, I.; Chen, Z.; Pevzner, P.; Korobeynikov, A.

в: PeerJ, Том 12, № 9, 27.09.2024.

Результаты исследований: Научные публикации в периодических изданияхстатьяРецензирование

Harvard

APA

Vancouver

Author

BibTeX

@article{55625ebb2a3d4063a908334e753a5b99,
title = "SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs",
abstract = "Background: Recent advances in long-read sequencing technologies enabled accurate and contiguous de novo assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform them into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes. Results: We developed the SpLitteR tool for diploid genome assembly using linked-reads and assembly graphs and benchmarked it against state-of-the-art linked-read scaffolders ARKS and SLR-superscaffolder using human HG002 genome and sheep gut microbiome datasets. The benchmark showed that SpLitteR scaffolding results in 1.5-fold increase in NGA50 compared to the baseline LJA assembly and other scaffolders while introducing no additional misassemblies on the human dataset. Conclusion: We developed the SpLitteR tool for assembly graph phasing and scaffolding using barcoded linked-reads. We benchmarked SpLitteR on assembly graphs produced by various long-read assemblers and have demonstrated that TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics. SpLitteR is implemented in C++ as a part of the freely available SPAdes package and is available at https://github.com/ablab/spades/releases/tag/splitter-preprint. Copyright 2024 Tolstoganov et al.",
keywords = "Assembly graph, Repeat resolution, Tell-seq, article, benchmarking, diploidy, genome, human, metagenome, microbiome, nonhuman, sheep, telomere, Sheep/genetics, Humans, Sequence Analysis, DNA/methods, High-Throughput Nucleotide Sequencing/methods, Diploidy, Animals, Genome/genetics, Genome, Human/genetics, Gastrointestinal Microbiome/genetics, Software",
author = "I. Tolstoganov and Z. Chen and P. Pevzner and A. Korobeynikov",
note = "Export Date: 21 October 2024 Адрес для корреспонденции: Korobeynikov, A.; Department of Statistical Modelling, Russian Federation; эл. почта: anton@korobeynikov.info",
year = "2024",
month = sep,
day = "27",
doi = "10.7717/peerj.18050",
language = "Английский",
volume = "12",
journal = "PeerJ",
issn = "2167-8359",
publisher = "PeerJ",
number = "9",

}

RIS

TY - JOUR

T1 - SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs

AU - Tolstoganov, I.

AU - Chen, Z.

AU - Pevzner, P.

AU - Korobeynikov, A.

N1 - Export Date: 21 October 2024 Адрес для корреспонденции: Korobeynikov, A.; Department of Statistical Modelling, Russian Federation; эл. почта: anton@korobeynikov.info

PY - 2024/9/27

Y1 - 2024/9/27

N2 - Background: Recent advances in long-read sequencing technologies enabled accurate and contiguous de novo assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform them into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes. Results: We developed the SpLitteR tool for diploid genome assembly using linked-reads and assembly graphs and benchmarked it against state-of-the-art linked-read scaffolders ARKS and SLR-superscaffolder using human HG002 genome and sheep gut microbiome datasets. The benchmark showed that SpLitteR scaffolding results in 1.5-fold increase in NGA50 compared to the baseline LJA assembly and other scaffolders while introducing no additional misassemblies on the human dataset. Conclusion: We developed the SpLitteR tool for assembly graph phasing and scaffolding using barcoded linked-reads. We benchmarked SpLitteR on assembly graphs produced by various long-read assemblers and have demonstrated that TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics. SpLitteR is implemented in C++ as a part of the freely available SPAdes package and is available at https://github.com/ablab/spades/releases/tag/splitter-preprint. Copyright 2024 Tolstoganov et al.

AB - Background: Recent advances in long-read sequencing technologies enabled accurate and contiguous de novo assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform them into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes. Results: We developed the SpLitteR tool for diploid genome assembly using linked-reads and assembly graphs and benchmarked it against state-of-the-art linked-read scaffolders ARKS and SLR-superscaffolder using human HG002 genome and sheep gut microbiome datasets. The benchmark showed that SpLitteR scaffolding results in 1.5-fold increase in NGA50 compared to the baseline LJA assembly and other scaffolders while introducing no additional misassemblies on the human dataset. Conclusion: We developed the SpLitteR tool for assembly graph phasing and scaffolding using barcoded linked-reads. We benchmarked SpLitteR on assembly graphs produced by various long-read assemblers and have demonstrated that TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics. SpLitteR is implemented in C++ as a part of the freely available SPAdes package and is available at https://github.com/ablab/spades/releases/tag/splitter-preprint. Copyright 2024 Tolstoganov et al.

KW - Assembly graph

KW - Repeat resolution

KW - Tell-seq

KW - article

KW - benchmarking

KW - diploidy

KW - genome

KW - human

KW - metagenome

KW - microbiome

KW - nonhuman

KW - sheep

KW - telomere

KW - Sheep/genetics

KW - Humans

KW - Sequence Analysis, DNA/methods

KW - High-Throughput Nucleotide Sequencing/methods

KW - Diploidy

KW - Animals

KW - Genome/genetics

KW - Genome, Human/genetics

KW - Gastrointestinal Microbiome/genetics

KW - Software

U2 - 10.7717/peerj.18050

DO - 10.7717/peerj.18050

M3 - статья

C2 - 39351368

VL - 12

JO - PeerJ

JF - PeerJ

SN - 2167-8359

IS - 9

ER -

ID: 126221872