Standard

NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters. / Leão, Tiago F.; Wang, Mingxun; da Silva, Ricardo; Gurevich, Alexey ; Bauermeister, Anelize; Gomes, Paulo Wender P.; Brejnrod, Asker; Glukhov, Evgenia; Aron, Allegra T.; Louwen, Joris J. R.; Kim, Hyun Woo; Reher, Raphael; Fiore, Marli F.; van der Hooft, Justin J.J.; Gerwick, Lena; Gerwick, William H.; Bandeira, Nuno; Dorrestein, Pieter C.

In: PNAS Nexus, 26.11.2022.

Research output: Contribution to journalArticlepeer-review

Harvard

Leão, TF, Wang, M, da Silva, R, Gurevich, A, Bauermeister, A, Gomes, PWP, Brejnrod, A, Glukhov, E, Aron, AT, Louwen, JJR, Kim, HW, Reher, R, Fiore, MF, van der Hooft, JJJ, Gerwick, L, Gerwick, WH, Bandeira, N & Dorrestein, PC 2022, 'NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters', PNAS Nexus.

APA

Leão, T. F., Wang, M., da Silva, R., Gurevich, A., Bauermeister, A., Gomes, P. W. P., Brejnrod, A., Glukhov, E., Aron, A. T., Louwen, J. J. R., Kim, H. W., Reher, R., Fiore, M. F., van der Hooft, J. J. J., Gerwick, L., Gerwick, W. H., Bandeira, N., & Dorrestein, P. C. (2022). NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters. PNAS Nexus, [pgac257].

Vancouver

Leão TF, Wang M, da Silva R, Gurevich A, Bauermeister A, Gomes PWP et al. NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters. PNAS Nexus. 2022 Nov 26. pgac257.

Author

Leão, Tiago F. ; Wang, Mingxun ; da Silva, Ricardo ; Gurevich, Alexey ; Bauermeister, Anelize ; Gomes, Paulo Wender P. ; Brejnrod, Asker ; Glukhov, Evgenia ; Aron, Allegra T. ; Louwen, Joris J. R. ; Kim, Hyun Woo ; Reher, Raphael ; Fiore, Marli F. ; van der Hooft, Justin J.J. ; Gerwick, Lena ; Gerwick, William H. ; Bandeira, Nuno ; Dorrestein, Pieter C. / NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters. In: PNAS Nexus. 2022.

BibTeX

@article{7dc7384f8b1d470a86d480f9bf4c33a0,
title = "NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters",
abstract = "Microbial specialized metabolites are an important source of and inspiration for many pharmaceutical, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to combine NPOmix with MassQL for mining siderophores that can be reproduced by NPOmix users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining.",
keywords = "genomics, mass spectrometry, machine learning, Specialized metabolites, biosynthetic gene clusters",
author = "Le{\~a}o, {Tiago F.} and Mingxun Wang and {da Silva}, Ricardo and Alexey Gurevich and Anelize Bauermeister and Gomes, {Paulo Wender P.} and Asker Brejnrod and Evgenia Glukhov and Aron, {Allegra T.} and Louwen, {Joris J. R.} and Kim, {Hyun Woo} and Raphael Reher and Fiore, {Marli F.} and {van der Hooft}, {Justin J.J.} and Lena Gerwick and Gerwick, {William H.} and Nuno Bandeira and Dorrestein, {Pieter C.}",
note = "Tiago F Le{\~a}o, Mingxun Wang, Ricardo da Silva, Alexey Gurevich, Anelize Bauermeister, Paulo Wender P Gomes, Asker Brejnrod, Evgenia Glukhov, Allegra T Aron, Joris J R Louwen, Hyun Woo Kim, Raphael Reher, Marli F Fiore, Justin J J van der Hooft, Lena Gerwick, William H Gerwick, Nuno Bandeira, Pieter C Dorrestein, NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters, PNAS Nexus, 2022;, pgac257, https://doi.org/10.1093/pnasnexus/pgac257",
year = "2022",
month = nov,
day = "26",
language = "English",
journal = "PNAS Nexus",
issn = "2752-6542",
publisher = "Oxford University Press",

}

RIS

TY - JOUR

T1 - NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters

AU - Leão, Tiago F.

AU - Wang, Mingxun

AU - da Silva, Ricardo

AU - Gurevich, Alexey

AU - Bauermeister, Anelize

AU - Gomes, Paulo Wender P.

AU - Brejnrod, Asker

AU - Glukhov, Evgenia

AU - Aron, Allegra T.

AU - Louwen, Joris J. R.

AU - Kim, Hyun Woo

AU - Reher, Raphael

AU - Fiore, Marli F.

AU - van der Hooft, Justin J.J.

AU - Gerwick, Lena

AU - Gerwick, William H.

AU - Bandeira, Nuno

AU - Dorrestein, Pieter C.

N1 - Tiago F Leão, Mingxun Wang, Ricardo da Silva, Alexey Gurevich, Anelize Bauermeister, Paulo Wender P Gomes, Asker Brejnrod, Evgenia Glukhov, Allegra T Aron, Joris J R Louwen, Hyun Woo Kim, Raphael Reher, Marli F Fiore, Justin J J van der Hooft, Lena Gerwick, William H Gerwick, Nuno Bandeira, Pieter C Dorrestein, NPOmix: a machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters, PNAS Nexus, 2022;, pgac257, https://doi.org/10.1093/pnasnexus/pgac257

PY - 2022/11/26

Y1 - 2022/11/26

N2 - Microbial specialized metabolites are an important source of and inspiration for many pharmaceutical, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to combine NPOmix with MassQL for mining siderophores that can be reproduced by NPOmix users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining.

AB - Microbial specialized metabolites are an important source of and inspiration for many pharmaceutical, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to combine NPOmix with MassQL for mining siderophores that can be reproduced by NPOmix users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining.

KW - genomics

KW - mass spectrometry

KW - machine learning

KW - Specialized metabolites

KW - biosynthetic gene clusters

UR - https://www.biorxiv.org/content/10.1101/2021.10.05.463235v2.article-info

M3 - Article

JO - PNAS Nexus

JF - PNAS Nexus

SN - 2752-6542

M1 - pgac257

ER -

ID: 100483223