Versatile genome assembly evaluation with QUAST-LG

Research output: Contribution to journal › Article › peer-review

Center of Algorithmic Biotechnology

DOI

https://doi.org/10.1093/bioinformatics/bty266
Final published version

Motivation: The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results: In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG?a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.

Original language	English
Pages (from-to)	i142-i150
Number of pages	9
Journal	Bioinformatics
Volume	34
Issue number	13
DOIs	https://doi.org/10.1093/bioinformatics/bty266
State	Published - 1 Jul 2018
Event	Intelligent Systems for Molecular Biology - Чикаго, United States Duration: 6 Jul 2018 → 10 Jul 2018 Conference number: 2018 https://www.iscb.org/ismb2018

Scopus subject areas

Computational Mathematics
Molecular Biology
Biochemistry
Statistics and Probability
Computer Science Applications
Computational Theory and Mathematics

Research areas

DE-NOVO, METAGENOME ASSEMBLIES, SHORT READS, ALGORITHMS, BENCHMARK, SOFTWARE, GAUGE, SCALE, TOOL

ID: 32867678