The composition of dense neural networks and formal grammars for secondary structure analysis

Research output

Abstract

We propose a way to combine formal grammars and artificial neural networks for biological sequences processing. Formal grammars encode the secondary structure of the sequence and neural networks deal with mutations and noise. In contrast to the classical way, when probabilistic grammars are used for secondary structure modeling, we propose to use arbitrary (not probabilistic) grammars which simplifies grammar creation. Instead of modeling the structure of the whole sequence, we create a grammar which only describes features of the secondary structure. Then we use undirected matrix-based parsing to extract features: the fact that some substring can be derived from some nonterminal is a feature. After that, we use a dense neural network to process features. In this paper, we describe in details all the parts of our receipt: a grammar, parsing algorithm, and network architecture. We discuss possible improvements and future work. Finally, we provide the results of tRNA and 16s rRNA processing which shows the applicability of our idea to real problems.

Original languageEnglish
Title of host publicationBIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019
EditorsElisabetta De Maria, Hugo Gamboa, Ana Fred
PublisherSciTePress
Pages234-241
Number of pages8
ISBN (Electronic)9789897583537
Publication statusPublished - 1 Jan 2019
Event10th International Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2019 - Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019 - Prague
Duration: 22 Feb 201924 Feb 2019

Publication series

NameBIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019

Conference

Conference10th International Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2019 - Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019
CountryCzech Republic
CityPrague
Period22/02/1924/02/19

Fingerprint

Neural networks
Chemical analysis
Processing
Network architecture
Transfer RNA

Scopus subject areas

  • Biomedical Engineering
  • Electrical and Electronic Engineering

Cite this

Grigorev, S., & Lunina, P. (2019). The composition of dense neural networks and formal grammars for secondary structure analysis. In E. De Maria, H. Gamboa, & A. Fred (Eds.), BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019 (pp. 234-241). (BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019). SciTePress.
Grigorev, Semyon ; Lunina, Polina. / The composition of dense neural networks and formal grammars for secondary structure analysis. BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019. editor / Elisabetta De Maria ; Hugo Gamboa ; Ana Fred. SciTePress, 2019. pp. 234-241 (BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019).
@inproceedings{8fe3c69cd9d44141bde61b826b5b29c6,
title = "The composition of dense neural networks and formal grammars for secondary structure analysis",
abstract = "We propose a way to combine formal grammars and artificial neural networks for biological sequences processing. Formal grammars encode the secondary structure of the sequence and neural networks deal with mutations and noise. In contrast to the classical way, when probabilistic grammars are used for secondary structure modeling, we propose to use arbitrary (not probabilistic) grammars which simplifies grammar creation. Instead of modeling the structure of the whole sequence, we create a grammar which only describes features of the secondary structure. Then we use undirected matrix-based parsing to extract features: the fact that some substring can be derived from some nonterminal is a feature. After that, we use a dense neural network to process features. In this paper, we describe in details all the parts of our receipt: a grammar, parsing algorithm, and network architecture. We discuss possible improvements and future work. Finally, we provide the results of tRNA and 16s rRNA processing which shows the applicability of our idea to real problems.",
keywords = "Dense Neural Network, DNN, Formal Grammars, Genomic Sequences, Machine Learning, Parsing, Proteomic Sequences, Secondary Structure",
author = "Semyon Grigorev and Polina Lunina",
year = "2019",
month = "1",
day = "1",
language = "English",
series = "BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019",
publisher = "SciTePress",
pages = "234--241",
editor = "{De Maria}, Elisabetta and Hugo Gamboa and Ana Fred",
booktitle = "BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019",
address = "Portugal",

}

Grigorev, S & Lunina, P 2019, The composition of dense neural networks and formal grammars for secondary structure analysis. in E De Maria, H Gamboa & A Fred (eds), BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019. BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019, SciTePress, pp. 234-241, Prague, 22/02/19.

The composition of dense neural networks and formal grammars for secondary structure analysis. / Grigorev, Semyon; Lunina, Polina.

BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019. ed. / Elisabetta De Maria; Hugo Gamboa; Ana Fred. SciTePress, 2019. p. 234-241 (BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019).

Research output

TY - GEN

T1 - The composition of dense neural networks and formal grammars for secondary structure analysis

AU - Grigorev, Semyon

AU - Lunina, Polina

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We propose a way to combine formal grammars and artificial neural networks for biological sequences processing. Formal grammars encode the secondary structure of the sequence and neural networks deal with mutations and noise. In contrast to the classical way, when probabilistic grammars are used for secondary structure modeling, we propose to use arbitrary (not probabilistic) grammars which simplifies grammar creation. Instead of modeling the structure of the whole sequence, we create a grammar which only describes features of the secondary structure. Then we use undirected matrix-based parsing to extract features: the fact that some substring can be derived from some nonterminal is a feature. After that, we use a dense neural network to process features. In this paper, we describe in details all the parts of our receipt: a grammar, parsing algorithm, and network architecture. We discuss possible improvements and future work. Finally, we provide the results of tRNA and 16s rRNA processing which shows the applicability of our idea to real problems.

AB - We propose a way to combine formal grammars and artificial neural networks for biological sequences processing. Formal grammars encode the secondary structure of the sequence and neural networks deal with mutations and noise. In contrast to the classical way, when probabilistic grammars are used for secondary structure modeling, we propose to use arbitrary (not probabilistic) grammars which simplifies grammar creation. Instead of modeling the structure of the whole sequence, we create a grammar which only describes features of the secondary structure. Then we use undirected matrix-based parsing to extract features: the fact that some substring can be derived from some nonterminal is a feature. After that, we use a dense neural network to process features. In this paper, we describe in details all the parts of our receipt: a grammar, parsing algorithm, and network architecture. We discuss possible improvements and future work. Finally, we provide the results of tRNA and 16s rRNA processing which shows the applicability of our idea to real problems.

KW - Dense Neural Network

KW - DNN

KW - Formal Grammars

KW - Genomic Sequences

KW - Machine Learning

KW - Parsing

KW - Proteomic Sequences

KW - Secondary Structure

UR - http://www.scopus.com/inward/record.url?scp=85064687958&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85064687958

T3 - BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019

SP - 234

EP - 241

BT - BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019

A2 - De Maria, Elisabetta

A2 - Gamboa, Hugo

A2 - Fred, Ana

PB - SciTePress

ER -

Grigorev S, Lunina P. The composition of dense neural networks and formal grammars for secondary structure analysis. In De Maria E, Gamboa H, Fred A, editors, BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019. SciTePress. 2019. p. 234-241. (BIOINFORMATICS 2019 - 10th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019).