RnaSPAdes: A de novo transcriptome assembler and its application to RNA-Seq data

Elena Bushmanova, Dmitry Antipov, Alla Lapidus, Andrey D. Prjibelski

Research output

3 Citations (Scopus)

Abstract

Background: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.

Original languageEnglish
Article numbergiz100
JournalGigaScience
Volume8
Issue number9
DOIs
Publication statusPublished - 18 Sep 2019

Fingerprint

RNA
Transcriptome
Genes
RNA Sequence Analysis
Genome
Alternative Splicing
Protein Isoforms
Statistics
Datasets

Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Bushmanova, Elena ; Antipov, Dmitry ; Lapidus, Alla ; Prjibelski, Andrey D. / RnaSPAdes: A de novo transcriptome assembler and its application to RNA-Seq data. In: GigaScience. 2019 ; Vol. 8, No. 9.
@article{62048f59382f405aa965df883102fbba,
title = "RnaSPAdes: A de novo transcriptome assembler and its application to RNA-Seq data",
abstract = "Background: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.",
keywords = "de novo assembly, RNA-Seq, transcriptome assembly",
author = "Elena Bushmanova and Dmitry Antipov and Alla Lapidus and Prjibelski, {Andrey D.}",
year = "2019",
month = "9",
day = "18",
doi = "10.1093/gigascience/giz100",
language = "English",
volume = "8",
journal = "GigaScience",
issn = "2047-217X",
publisher = "BioMed Central",
number = "9",

}

RnaSPAdes: A de novo transcriptome assembler and its application to RNA-Seq data. / Bushmanova, Elena; Antipov, Dmitry; Lapidus, Alla; Prjibelski, Andrey D.

In: GigaScience, Vol. 8, No. 9, giz100, 18.09.2019.

Research output

TY - JOUR

T1 - RnaSPAdes: A de novo transcriptome assembler and its application to RNA-Seq data

AU - Bushmanova, Elena

AU - Antipov, Dmitry

AU - Lapidus, Alla

AU - Prjibelski, Andrey D.

PY - 2019/9/18

Y1 - 2019/9/18

N2 - Background: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.

AB - Background: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.

KW - de novo assembly

KW - RNA-Seq

KW - transcriptome assembly

UR - http://www.scopus.com/inward/record.url?scp=85071896681&partnerID=8YFLogxK

UR - https://www.biorxiv.org/content/early/2018/09/18/420208

UR - http://www.mendeley.com/research/rnaspades-novo-transcriptome-assembler-application-rnaseq-data

U2 - 10.1093/gigascience/giz100

DO - 10.1093/gigascience/giz100

M3 - Article

C2 - 31494669

VL - 8

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 9

M1 - giz100

ER -