Motivation: The recently developed barcoding-based synthetic long read (SLR) technologies have already found many applications in genome assembly and analysis. However, although some new barcoding protocols are emerging and the range of SLR applications is being expanded, the existing SLR assemblers are optimized for a narrow range of parameters and are not easily extendable to new barcoding technologies and new applications such as metagenomics or hybrid assembly. Results: We describe the algorithmic challenge of the SLR assembly and present a cloudSPAdes algorithm for SLR assembly that is based on analyzing the de Bruijn graph of SLRs. We benchmarked cloudSPAdes across various barcoding technologies/applications and demonstrated that it improves on the state-of-the-art SLR assemblers in accuracy and speed.

Original languageEnglish
Article numberbtz349
Pages (from-to)i61-i70
Number of pages10
JournalBioinformatics
Volume35
Issue number14
Early online date5 Jul 2019
DOIs
StatePublished - 15 Jul 2019

    Scopus subject areas

  • Computational Mathematics
  • Molecular Biology
  • Biochemistry
  • Statistics and Probability
  • Computer Science Applications
  • Computational Theory and Mathematics

    Research areas

  • conference paper, velocity, algorithm, Article, bioinformatics, ACCURATE, DNA EXTRACTION, GENOME

ID: 49523355