Research output: Contribution to journal › Article › peer-review
Petabase-scale sequence alignment catalyses viral discovery. / Edgar, Robert C.; Taylor, Jeff; Lin, Victor; Altman, Tomer; Barbera, Pierre; Meleshko, Dmitry; Lohr, Dan; Novakovsky, Gherman; Buchfink, Benjamin; Al-Shayeb, Basem; Banfield, Jillian F.; de la Peña, Marcos; Korobeynikov, Anton; Chikhi, Rayan; Babaian, Artem.
In: Nature, Vol. 602, No. 7895, 03.02.2022, p. 142-147.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Petabase-scale sequence alignment catalyses viral discovery
AU - Edgar, Robert C.
AU - Taylor, Jeff
AU - Lin, Victor
AU - Altman, Tomer
AU - Barbera, Pierre
AU - Meleshko, Dmitry
AU - Lohr, Dan
AU - Novakovsky, Gherman
AU - Buchfink, Benjamin
AU - Al-Shayeb, Basem
AU - Banfield, Jillian F.
AU - de la Peña, Marcos
AU - Korobeynikov, Anton
AU - Chikhi, Rayan
AU - Babaian, Artem
N1 - Publisher Copyright: © 2022, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2022/2/3
Y1 - 2022/2/3
N2 - Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
AB - Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
KW - STRUCTURAL BASIS
KW - SEARCH
KW - HEPATITIS
KW - VIRUSES
UR - http://www.scopus.com/inward/record.url?scp=85123581753&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/d4c86401-f340-3f51-8e6a-cc92bd52647e/
U2 - 10.1038/s41586-021-04332-2
DO - 10.1038/s41586-021-04332-2
M3 - Article
AN - SCOPUS:85123581753
VL - 602
SP - 142
EP - 147
JO - Nature
JF - Nature
SN - 0028-0836
IS - 7895
ER -
ID: 92317623