Extending rnaSPAdes functionality for hybrid transcriptome assembly

Standard

Extending rnaSPAdes functionality for hybrid transcriptome assembly. / Prjibelski, Andrey D.; Puglia, Giuseppe D.; Antipov, Dmitry ; Bushmanova, Elena; Giordano, Daniela; Mikheenko, Alla; Vitale, Domenico; Lapidus, Alla.

In: BMC Bioinformatics, Vol. 21, No. Suppl 12, 302, 24.07.2020, p. 302.

Research output: Contribution to journal › Article › peer-review

BibTeX

@article{4730fc31507844aebd4ea6d2f239ac9b,

title = "Extending rnaSPAdes functionality for hybrid transcriptome assembly",

abstract = "Background: De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. Results: In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. Conclusion: To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used. ",

keywords = "transcriptomics, transcriptome assembly, RNA-Seq, Oxford nanopores, Iso-seq, Hybrid assembly, De novo assembly, Algorithms, Databases, Genetic, Humans, MCF-7 Cells, Nanopores, RNA-Seq, Reproducibility of Results, Transcriptome/genetics, Iso-seq, De novo assembly, Oxford nanopores, QUALITY ASSESSMENT, Transcriptome assembly, Transcriptomics, Hybrid assembly",

author = "Prjibelski, {Andrey D.} and Puglia, {Giuseppe D.} and Dmitry Antipov and Elena Bushmanova and Daniela Giordano and Alla Mikheenko and Domenico Vitale and Alla Lapidus",

note = "Funding Information: Publication of this supplement is funded by Russian Science Foundation (grant number 19-14-00172).",

year = "2020",

month = jul,

day = "24",

doi = "10.1186/s12859-020-03614-2",

language = "English",

volume = "21",

pages = "302",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central Ltd.",

number = "Suppl 12",

}

RIS

TY - JOUR

T1 - Extending rnaSPAdes functionality for hybrid transcriptome assembly

AU - Prjibelski, Andrey D.

AU - Puglia, Giuseppe D.

AU - Antipov, Dmitry

AU - Bushmanova, Elena

AU - Giordano, Daniela

AU - Mikheenko, Alla

AU - Vitale, Domenico

AU - Lapidus, Alla

N1 - Funding Information: Publication of this supplement is funded by Russian Science Foundation (grant number 19-14-00172).

PY - 2020/7/24

Y1 - 2020/7/24

N2 - Background: De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. Results: In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. Conclusion: To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

AB - Background: De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. Results: In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. Conclusion: To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

KW - transcriptomics

KW - transcriptome assembly

KW - RNA-Seq

KW - Oxford nanopores

KW - Iso-seq

KW - Hybrid assembly

KW - De novo assembly

KW - Algorithms

KW - Databases, Genetic

KW - Humans

KW - MCF-7 Cells

KW - Nanopores

KW - RNA-Seq

KW - Reproducibility of Results

KW - Transcriptome/genetics

KW - Iso-seq

KW - De novo assembly

KW - Oxford nanopores

KW - QUALITY ASSESSMENT

KW - Transcriptome assembly

KW - Transcriptomics

KW - Hybrid assembly

UR - http://www.scopus.com/inward/record.url?scp=85088528928&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/43b7f243-03f1-3579-bc16-92a5d5c3ec91/

U2 - 10.1186/s12859-020-03614-2

DO - 10.1186/s12859-020-03614-2

M3 - Article

C2 - 32703149

AN - SCOPUS:85088528928

VL - 21

SP - 302

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - Suppl 12

M1 - 302

ER -

ID: 61160726