Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage

Yury A. Barbitoff, Dmitrii E. Polev, Andrey S. Glotov, Elena A. Serebryakova, Irina V. Shcherbakova, Artem M. Kiselev, Anna A. Kostareva, Oleg S. Glotov, Alexander V. Predeus

Research output

6 Citations (Scopus)

Abstract

Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3rd generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the ~ 500 kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost.

Original languageEnglish
Article number2057
JournalScientific Reports
Volume10
Issue number1
DOIs
Publication statusPublished - 1 Dec 2020

Scopus subject areas

  • General

Fingerprint Dive into the research topics of 'Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage'. Together they form a unique fingerprint.

Cite this