Research output: Contribution to journal › Article › peer-review
CentromereArchitect : Inference and analysis of the architecture of centromeres. / Dvorkina, Tatiana; Kunyavskaya, Olga; Bzikadze, Andrey V.; Alexandrov, Ivan; Pevzner, Pavel A.
In: Bioinformatics, Vol. 37, 01.07.2021, p. 196-204.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - CentromereArchitect
T2 - Inference and analysis of the architecture of centromeres
AU - Dvorkina, Tatiana
AU - Kunyavskaya, Olga
AU - Bzikadze, Andrey V.
AU - Alexandrov, Ivan
AU - Pevzner, Pavel A.
N1 - Publisher Copyright: © 2021 The Author(s) 2021. Published by Oxford University Press.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Motivation: Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results: We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for 'live' centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution.
AB - Motivation: Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results: We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for 'live' centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution.
KW - Algorithms
KW - Base Sequence
KW - Centromere/genetics
KW - Genome
KW - Humans
KW - Telomere
KW - ALPHA-SATELLITE DNA
KW - REPEAT
KW - ANNOTATION
KW - OLD
KW - SEQUENCE
UR - http://www.scopus.com/inward/record.url?scp=85111438021&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btab265
DO - 10.1093/bioinformatics/btab265
M3 - Article
C2 - 34252949
AN - SCOPUS:85111438021
VL - 37
SP - 196
EP - 204
JO - Bioinformatics
JF - Bioinformatics
SN - 1367-4803
ER -
ID: 89178597