DOI

  • Son K. Pham
  • Dmitry Antipov
  • Alexander Sirotkin
  • Glenn Tesler
  • Pavel A. Pevzner
  • Max A. Alekseyev

One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.

Язык оригиналаанглийский
Название основной публикацииResearch in Computational Molecular Biology - 16th Annual International Conference, RECOMB 2012, Proceedings
Страницы200-212
Число страниц13
DOI
СостояниеОпубликовано - 2012
Событие16th Annual International Conference on Research in Computational Molecular Biology - Barcelona, Испания
Продолжительность: 21 апр 201224 апр 2012

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том7262 LNBI
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

конференция

конференция16th Annual International Conference on Research in Computational Molecular Biology
Сокращенное названиеRECOMB 2012
Страна/TерриторияИспания
ГородBarcelona
Период21/04/1224/04/12

    Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

ID: 100630943