Versatile genome assembly evaluation with QUAST-LG

Alla Mikheenko, Andrey Prjibelski, Vladislav Saveliev, Dmitry Antipov, Alexey Gurevich

Research output

35 Citations (Scopus)

Abstract

Motivation: The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results: In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG?a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.

Original languageEnglish
Pages (from-to)i142-i150
JournalBioinformatics
Volume34
Issue number13
DOIs
Publication statusPublished - 1 Jul 2018
Externally publishedYes
EventIntelligent Systems for Molecular Biology - Чикаго
Duration: 6 Jul 201810 Jul 2018
Conference number: 2018
https://www.iscb.org/ismb2018

Fingerprint

Genome
Genes
Evaluation
Technology
Genome Size
Sequencing
Genomics
Software
High Throughput
Completeness
Correctness
Coverage
Pipelines
Throughput
Upper bound
Metric
Evaluate
Demonstrate
Datasets

Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., & Gurevich, A. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), i142-i150. https://doi.org/10.1093/bioinformatics/bty266
Mikheenko, Alla ; Prjibelski, Andrey ; Saveliev, Vladislav ; Antipov, Dmitry ; Gurevich, Alexey. / Versatile genome assembly evaluation with QUAST-LG. In: Bioinformatics. 2018 ; Vol. 34, No. 13. pp. i142-i150.
@article{a68621b4acd746c58133d418842905fc,
title = "Versatile genome assembly evaluation with QUAST-LG",
abstract = "Motivation: The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results: In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG?a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.",
author = "Alla Mikheenko and Andrey Prjibelski and Vladislav Saveliev and Dmitry Antipov and Alexey Gurevich",
year = "2018",
month = "7",
day = "1",
doi = "10.1093/bioinformatics/bty266",
language = "English",
volume = "34",
pages = "i142--i150",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "13",

}

Mikheenko, A, Prjibelski, A, Saveliev, V, Antipov, D & Gurevich, A 2018, 'Versatile genome assembly evaluation with QUAST-LG', Bioinformatics, vol. 34, no. 13, pp. i142-i150. https://doi.org/10.1093/bioinformatics/bty266

Versatile genome assembly evaluation with QUAST-LG. / Mikheenko, Alla; Prjibelski, Andrey; Saveliev, Vladislav; Antipov, Dmitry; Gurevich, Alexey.

In: Bioinformatics, Vol. 34, No. 13, 01.07.2018, p. i142-i150.

Research output

TY - JOUR

T1 - Versatile genome assembly evaluation with QUAST-LG

AU - Mikheenko, Alla

AU - Prjibelski, Andrey

AU - Saveliev, Vladislav

AU - Antipov, Dmitry

AU - Gurevich, Alexey

PY - 2018/7/1

Y1 - 2018/7/1

N2 - Motivation: The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results: In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG?a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.

AB - Motivation: The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results: In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG?a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.

UR - http://www.scopus.com/inward/record.url?scp=85050791638&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty266

DO - 10.1093/bioinformatics/bty266

M3 - Article

AN - SCOPUS:85050791638

VL - 34

SP - i142-i150

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 13

ER -

Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018 Jul 1;34(13):i142-i150. https://doi.org/10.1093/bioinformatics/bty266