Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns

Standard

Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns. / Mikheenko, Alla ; Prjibelski, Andrey D.; Joglekar, Anoushka; Tilgner, Hagen U.

в: Genome Research, Том 32, № 4, 01.04.2022, стр. 726-737.

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

BibTeX

@article{1807357aeaac4858b66decfcb5362d98,

title = "Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns",

abstract = "Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we present a novel platform-comparison method that combines barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). We compare these long-read pairs in terms of sequence content and isoform patterns. Although individual read pairs show high similarity, we find differences in (1) aligned length, (2) transcription start site (TSS), (3) polyadenylation site (poly(A)-site) assignment, and (4) exon-intron structures. Overall, 25% of read pairs disagree on either TSS, poly(A)-site, or splice site. Intron-chain disagreement typically arises from alignment errors of microexons and complicated splice sites. Our single-molecule technology comparison reveals that inconsistencies are often caused by sequencing error-induced inaccurate ONT alignments, especially to downstream GUNNGU donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors in ONT are often confirmed by PacBio and are thus likely real. In both barcoded and nonbarcoded ONT reads, we find that intron number and proximity of GU/AGs better predict inconsistencies with the annotation than read quality alone. We summarize these findings in an annotation-based algorithm for spliced alignment correction that improves subsequent transcript construction with ONT reads.",

keywords = "DNA, Complementary, High-Throughput Nucleotide Sequencing/methods, Nanopores, RNA, Sequence Analysis, DNA/methods, Technology",

author = "Alla Mikheenko and Prjibelski, {Andrey D.} and Anoushka Joglekar and Tilgner, {Hagen U.}",

year = "2022",

month = apr,

day = "1",

doi = "10.1101/gr.276405.121",

language = "English",

volume = "32",

pages = "726--737",

journal = "Genome Research",

issn = "1088-9051",

publisher = "Cold Spring Harbor Laboratory ",

number = "4",

}

RIS

TY - JOUR

T1 - Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore Technologies reveals platform-specific error patterns

AU - Mikheenko, Alla

AU - Prjibelski, Andrey D.

AU - Joglekar, Anoushka

AU - Tilgner, Hagen U.

PY - 2022/4/1

Y1 - 2022/4/1

N2 - Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we present a novel platform-comparison method that combines barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). We compare these long-read pairs in terms of sequence content and isoform patterns. Although individual read pairs show high similarity, we find differences in (1) aligned length, (2) transcription start site (TSS), (3) polyadenylation site (poly(A)-site) assignment, and (4) exon-intron structures. Overall, 25% of read pairs disagree on either TSS, poly(A)-site, or splice site. Intron-chain disagreement typically arises from alignment errors of microexons and complicated splice sites. Our single-molecule technology comparison reveals that inconsistencies are often caused by sequencing error-induced inaccurate ONT alignments, especially to downstream GUNNGU donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors in ONT are often confirmed by PacBio and are thus likely real. In both barcoded and nonbarcoded ONT reads, we find that intron number and proximity of GU/AGs better predict inconsistencies with the annotation than read quality alone. We summarize these findings in an annotation-based algorithm for spliced alignment correction that improves subsequent transcript construction with ONT reads.

AB - Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we present a novel platform-comparison method that combines barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). We compare these long-read pairs in terms of sequence content and isoform patterns. Although individual read pairs show high similarity, we find differences in (1) aligned length, (2) transcription start site (TSS), (3) polyadenylation site (poly(A)-site) assignment, and (4) exon-intron structures. Overall, 25% of read pairs disagree on either TSS, poly(A)-site, or splice site. Intron-chain disagreement typically arises from alignment errors of microexons and complicated splice sites. Our single-molecule technology comparison reveals that inconsistencies are often caused by sequencing error-induced inaccurate ONT alignments, especially to downstream GUNNGU donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors in ONT are often confirmed by PacBio and are thus likely real. In both barcoded and nonbarcoded ONT reads, we find that intron number and proximity of GU/AGs better predict inconsistencies with the annotation than read quality alone. We summarize these findings in an annotation-based algorithm for spliced alignment correction that improves subsequent transcript construction with ONT reads.

KW - DNA, Complementary

KW - High-Throughput Nucleotide Sequencing/methods

KW - Nanopores

KW - RNA

KW - Sequence Analysis, DNA/methods

KW - Technology

UR - http://www.scopus.com/inward/record.url?scp=85128488447&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/576b071d-0105-33e4-a8e9-8e98290dade9/

U2 - 10.1101/gr.276405.121

DO - 10.1101/gr.276405.121

M3 - Article

C2 - 35301264

AN - SCOPUS:85128488447

VL - 32

SP - 726

EP - 737

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 4

ER -

ID: 94683032