Standard

Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage. / Barbitoff, Yury A.; Polev, Dmitrii E.; Glotov, Andrey S.; Serebryakova, Elena A.; Shcherbakova, Irina V.; Kiselev, Artem M.; Kostareva, Anna A.; Glotov, Oleg S.; Predeus, Alexander V.

в: Scientific Reports, Том 10, № 1, 2057, 06.02.2020.

Результаты исследований: Научные публикации в периодических изданияхстатьяРецензирование

Harvard

Barbitoff, YA, Polev, DE, Glotov, AS, Serebryakova, EA, Shcherbakova, IV, Kiselev, AM, Kostareva, AA, Glotov, OS & Predeus, AV 2020, 'Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage', Scientific Reports, Том. 10, № 1, 2057. https://doi.org/10.1038/s41598-020-59026-y

APA

Barbitoff, Y. A., Polev, D. E., Glotov, A. S., Serebryakova, E. A., Shcherbakova, I. V., Kiselev, A. M., Kostareva, A. A., Glotov, O. S., & Predeus, A. V. (2020). Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage. Scientific Reports, 10(1), [2057]. https://doi.org/10.1038/s41598-020-59026-y

Vancouver

Barbitoff YA, Polev DE, Glotov AS, Serebryakova EA, Shcherbakova IV, Kiselev AM и пр. Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage. Scientific Reports. 2020 Февр. 6;10(1). 2057. https://doi.org/10.1038/s41598-020-59026-y

Author

Barbitoff, Yury A. ; Polev, Dmitrii E. ; Glotov, Andrey S. ; Serebryakova, Elena A. ; Shcherbakova, Irina V. ; Kiselev, Artem M. ; Kostareva, Anna A. ; Glotov, Oleg S. ; Predeus, Alexander V. / Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage. в: Scientific Reports. 2020 ; Том 10, № 1.

BibTeX

@article{1fbeb54074ab4ba8b2901fa9b4751dcf,
title = "Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage",
abstract = "Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3rd generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the ~ 500 kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost.",
keywords = "Base Sequence/genetics, Data Interpretation, Statistical, Exome/genetics, Genome, Human/genetics, High-Throughput Nucleotide Sequencing/statistics & numerical data, Humans, Machine Learning, Models, Genetic, Open Reading Frames/genetics, Regression Analysis, Whole Exome Sequencing/statistics & numerical data, Whole Genome Sequencing/statistics & numerical data, PERFORMANCE, CAPTURE",
author = "Barbitoff, {Yury A.} and Polev, {Dmitrii E.} and Glotov, {Andrey S.} and Serebryakova, {Elena A.} and Shcherbakova, {Irina V.} and Kiselev, {Artem M.} and Kostareva, {Anna A.} and Glotov, {Oleg S.} and Predeus, {Alexander V.}",
note = "Funding Information: We thank Anna Shuvalova and Olga Romanova for help in library preparation. This research was done using equipment of Biobank of the Research Park of SPBU. The research was supported by Russian Science Foundation (grants no. 14–50–00069, 18-75-00006,), CAF Charity Foundation, and D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, project 558-2019-0012 (АААА-А19119021290033-1) of FSBSI. We also thank Resource Center “Computational Center” of Saint Petersburg State University (project no. 110-7198-609) for providing computing resources and data storage. Publisher Copyright: {\textcopyright} 2020, The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.",
year = "2020",
month = feb,
day = "6",
doi = "10.1038/s41598-020-59026-y",
language = "English",
volume = "10",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

RIS

TY - JOUR

T1 - Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage

AU - Barbitoff, Yury A.

AU - Polev, Dmitrii E.

AU - Glotov, Andrey S.

AU - Serebryakova, Elena A.

AU - Shcherbakova, Irina V.

AU - Kiselev, Artem M.

AU - Kostareva, Anna A.

AU - Glotov, Oleg S.

AU - Predeus, Alexander V.

N1 - Funding Information: We thank Anna Shuvalova and Olga Romanova for help in library preparation. This research was done using equipment of Biobank of the Research Park of SPBU. The research was supported by Russian Science Foundation (grants no. 14–50–00069, 18-75-00006,), CAF Charity Foundation, and D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, project 558-2019-0012 (АААА-А19119021290033-1) of FSBSI. We also thank Resource Center “Computational Center” of Saint Petersburg State University (project no. 110-7198-609) for providing computing resources and data storage. Publisher Copyright: © 2020, The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2020/2/6

Y1 - 2020/2/6

N2 - Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3rd generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the ~ 500 kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost.

AB - Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3rd generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the ~ 500 kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost.

KW - Base Sequence/genetics

KW - Data Interpretation, Statistical

KW - Exome/genetics

KW - Genome, Human/genetics

KW - High-Throughput Nucleotide Sequencing/statistics & numerical data

KW - Humans

KW - Machine Learning

KW - Models, Genetic

KW - Open Reading Frames/genetics

KW - Regression Analysis

KW - Whole Exome Sequencing/statistics & numerical data

KW - Whole Genome Sequencing/statistics & numerical data

KW - PERFORMANCE

KW - CAPTURE

UR - http://www.scopus.com/inward/record.url?scp=85079051766&partnerID=8YFLogxK

U2 - 10.1038/s41598-020-59026-y

DO - 10.1038/s41598-020-59026-y

M3 - Article

C2 - 32029882

AN - SCOPUS:85079051766

VL - 10

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 2057

ER -

ID: 70416819