Standard

Petabase-scale sequence alignment catalyses viral discovery. / Edgar, Robert C.; Taylor, Jeff; Lin, Victor; Altman, Tomer; Barbera, Pierre; Meleshko, Dmitry; Lohr, Dan; Novakovsky, Gherman; Buchfink, Benjamin; Al-Shayeb, Basem; Banfield, Jillian F.; de la Peña, Marcos; Korobeynikov, Anton; Chikhi, Rayan; Babaian, Artem.

In: Nature, Vol. 602, No. 7895, 03.02.2022, p. 142-147.

Research output: Contribution to journalArticlepeer-review

Harvard

Edgar, RC, Taylor, J, Lin, V, Altman, T, Barbera, P, Meleshko, D, Lohr, D, Novakovsky, G, Buchfink, B, Al-Shayeb, B, Banfield, JF, de la Peña, M, Korobeynikov, A, Chikhi, R & Babaian, A 2022, 'Petabase-scale sequence alignment catalyses viral discovery', Nature, vol. 602, no. 7895, pp. 142-147. https://doi.org/10.1038/s41586-021-04332-2

APA

Edgar, R. C., Taylor, J., Lin, V., Altman, T., Barbera, P., Meleshko, D., Lohr, D., Novakovsky, G., Buchfink, B., Al-Shayeb, B., Banfield, J. F., de la Peña, M., Korobeynikov, A., Chikhi, R., & Babaian, A. (2022). Petabase-scale sequence alignment catalyses viral discovery. Nature, 602(7895), 142-147. https://doi.org/10.1038/s41586-021-04332-2

Vancouver

Edgar RC, Taylor J, Lin V, Altman T, Barbera P, Meleshko D et al. Petabase-scale sequence alignment catalyses viral discovery. Nature. 2022 Feb 3;602(7895):142-147. https://doi.org/10.1038/s41586-021-04332-2

Author

Edgar, Robert C. ; Taylor, Jeff ; Lin, Victor ; Altman, Tomer ; Barbera, Pierre ; Meleshko, Dmitry ; Lohr, Dan ; Novakovsky, Gherman ; Buchfink, Benjamin ; Al-Shayeb, Basem ; Banfield, Jillian F. ; de la Peña, Marcos ; Korobeynikov, Anton ; Chikhi, Rayan ; Babaian, Artem. / Petabase-scale sequence alignment catalyses viral discovery. In: Nature. 2022 ; Vol. 602, No. 7895. pp. 142-147.

BibTeX

@article{746005dadb3c41ccb0078a9e7f28f63f,
title = "Petabase-scale sequence alignment catalyses viral discovery",
abstract = "Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.",
keywords = "STRUCTURAL BASIS, SEARCH, HEPATITIS, VIRUSES",
author = "Edgar, {Robert C.} and Jeff Taylor and Victor Lin and Tomer Altman and Pierre Barbera and Dmitry Meleshko and Dan Lohr and Gherman Novakovsky and Benjamin Buchfink and Basem Al-Shayeb and Banfield, {Jillian F.} and {de la Pe{\~n}a}, Marcos and Anton Korobeynikov and Rayan Chikhi and Artem Babaian",
note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Nature Limited.",
year = "2022",
month = feb,
day = "3",
doi = "10.1038/s41586-021-04332-2",
language = "English",
volume = "602",
pages = "142--147",
journal = "Nature",
issn = "0028-0836",
publisher = "Nature Publishing Group",
number = "7895",

}

RIS

TY - JOUR

T1 - Petabase-scale sequence alignment catalyses viral discovery

AU - Edgar, Robert C.

AU - Taylor, Jeff

AU - Lin, Victor

AU - Altman, Tomer

AU - Barbera, Pierre

AU - Meleshko, Dmitry

AU - Lohr, Dan

AU - Novakovsky, Gherman

AU - Buchfink, Benjamin

AU - Al-Shayeb, Basem

AU - Banfield, Jillian F.

AU - de la Peña, Marcos

AU - Korobeynikov, Anton

AU - Chikhi, Rayan

AU - Babaian, Artem

N1 - Publisher Copyright: © 2022, The Author(s), under exclusive licence to Springer Nature Limited.

PY - 2022/2/3

Y1 - 2022/2/3

N2 - Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.

AB - Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.

KW - STRUCTURAL BASIS

KW - SEARCH

KW - HEPATITIS

KW - VIRUSES

UR - http://www.scopus.com/inward/record.url?scp=85123581753&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/d4c86401-f340-3f51-8e6a-cc92bd52647e/

U2 - 10.1038/s41586-021-04332-2

DO - 10.1038/s41586-021-04332-2

M3 - Article

AN - SCOPUS:85123581753

VL - 602

SP - 142

EP - 147

JO - Nature

JF - Nature

SN - 0028-0836

IS - 7895

ER -

ID: 92317623