Standard

ORFograph : search for novel insecticidal protein genes in genomic and metagenomic assembly graphs. / Dvorkina, Tatiana; Bankevich, Anton; Sorokin, Alexei; Yang, Fan; Adu-Oppong, Boahemaa; Williams, Ryan; Turner, Keith; Pevzner, Pavel A.

In: Microbiome, Vol. 9, No. 1, 149, 12.2021, p. 149.

Research output: Contribution to journalArticlepeer-review

Harvard

Dvorkina, T, Bankevich, A, Sorokin, A, Yang, F, Adu-Oppong, B, Williams, R, Turner, K & Pevzner, PA 2021, 'ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs', Microbiome, vol. 9, no. 1, 149, pp. 149. https://doi.org/10.1186/s40168-021-01092-z

APA

Dvorkina, T., Bankevich, A., Sorokin, A., Yang, F., Adu-Oppong, B., Williams, R., Turner, K., & Pevzner, P. A. (2021). ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs. Microbiome, 9(1), 149. [149]. https://doi.org/10.1186/s40168-021-01092-z

Vancouver

Dvorkina T, Bankevich A, Sorokin A, Yang F, Adu-Oppong B, Williams R et al. ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs. Microbiome. 2021 Dec;9(1):149. 149. https://doi.org/10.1186/s40168-021-01092-z

Author

Dvorkina, Tatiana ; Bankevich, Anton ; Sorokin, Alexei ; Yang, Fan ; Adu-Oppong, Boahemaa ; Williams, Ryan ; Turner, Keith ; Pevzner, Pavel A. / ORFograph : search for novel insecticidal protein genes in genomic and metagenomic assembly graphs. In: Microbiome. 2021 ; Vol. 9, No. 1. pp. 149.

BibTeX

@article{98f4e0cb081942ed9e20cd878da567a9,
title = "ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs",
abstract = "BACKGROUND: Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics.METHODS: Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG.RESULTS: We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes.CONCLUSIONS: We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes {"}hidden{"} in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes. Video abstract.",
keywords = "Algorithms, Genomics, Insecticides, Metagenome, Metagenomics, Bioinsecticides, Gene finding, Bacterial genomics, Bioinformatics, BACILLUS-THURINGIENSIS, ALGORITHM, IDENTIFICATION, ACCURACY, PREDICTION, ALIGNMENT, DNA",
author = "Tatiana Dvorkina and Anton Bankevich and Alexei Sorokin and Fan Yang and Boahemaa Adu-Oppong and Ryan Williams and Keith Turner and Pevzner, {Pavel A}",
note = "Publisher Copyright: {\textcopyright} 2021, The Author(s).",
year = "2021",
month = dec,
doi = "10.1186/s40168-021-01092-z",
language = "English",
volume = "9",
pages = "149",
journal = "Microbiome",
issn = "2049-2618",
publisher = "BioMed Central Ltd.",
number = "1",

}

RIS

TY - JOUR

T1 - ORFograph

T2 - search for novel insecticidal protein genes in genomic and metagenomic assembly graphs

AU - Dvorkina, Tatiana

AU - Bankevich, Anton

AU - Sorokin, Alexei

AU - Yang, Fan

AU - Adu-Oppong, Boahemaa

AU - Williams, Ryan

AU - Turner, Keith

AU - Pevzner, Pavel A

N1 - Publisher Copyright: © 2021, The Author(s).

PY - 2021/12

Y1 - 2021/12

N2 - BACKGROUND: Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics.METHODS: Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG.RESULTS: We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes.CONCLUSIONS: We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes "hidden" in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes. Video abstract.

AB - BACKGROUND: Since the prolonged use of insecticidal proteins has led to toxin resistance, it is important to search for novel insecticidal protein genes (IPGs) that are effective in controlling resistant insect populations. IPGs are usually encoded in the genomes of entomopathogenic bacteria, especially in large plasmids in strains of the ubiquitous soil bacteria, Bacillus thuringiensis (Bt). Since there are often multiple similar IPGs encoded by such plasmids, their assemblies are typically fragmented and many IPGs are scattered through multiple contigs. As a result, existing gene prediction tools (that analyze individual contigs) typically predict partial rather than complete IPGs, making it difficult to conduct downstream IPG engineering efforts in agricultural genomics.METHODS: Although it is difficult to assemble IPGs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding a single IPG.RESULTS: We describe ORFograph, a pipeline for predicting IPGs in assembly graphs, benchmark it on (meta)genomic datasets, and discover nearly a hundred novel IPGs. This work shows that graph-aware gene prediction tools enable the discovery of greater diversity of IPGs from (meta)genomes.CONCLUSIONS: We demonstrated that analysis of the assembly graphs reveals novel candidate IPGs. ORFograph identified both already known genes "hidden" in assembly graphs and potential novel IPGs that evaded existing tools for IPG identification. As ORFograph is fast, one could imagine a pipeline that processes many (meta)genomic assembly graphs to identify even more novel IPGs for phenotypic testing than would previously be inaccessible by traditional gene-finding methods. While here we demonstrated the results of ORFograph only for IPGs, the proposed approach can be generalized to any class of genes. Video abstract.

KW - Algorithms

KW - Genomics

KW - Insecticides

KW - Metagenome

KW - Metagenomics

KW - Bioinsecticides

KW - Gene finding

KW - Bacterial genomics

KW - Bioinformatics

KW - BACILLUS-THURINGIENSIS

KW - ALGORITHM

KW - IDENTIFICATION

KW - ACCURACY

KW - PREDICTION

KW - ALIGNMENT

KW - DNA

UR - http://www.scopus.com/inward/record.url?scp=85109411058&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/fdb4a5fc-9ff6-3b0b-b197-a816eae4e9df/

U2 - 10.1186/s40168-021-01092-z

DO - 10.1186/s40168-021-01092-z

M3 - Article

C2 - 34183047

VL - 9

SP - 149

JO - Microbiome

JF - Microbiome

SN - 2049-2618

IS - 1

M1 - 149

ER -

ID: 89094139