The paper proposes an algorithm for marking the coding and non-coding regions of the DNA molecule, based on the method of finding the optimal set of features. As an example, DNA molecule of human cancer of BRCA2-206 is considered. Firstly the problem of determining whether a part of a DNA molecule belongs to introns or exons is considered. To solve this problem, a method based on TF-IDF markup in combination with a Bayes classifier, which accepts a set of optimal attributes as input, is proposed. The behavior of the method within the optimal set of parameters for determining the set of significant features is researched. Using this method, an algorithm has been developed that uses information on extremes of the probability function of a nucleotide chain to introns or exons for marking up a DNA molecule.
Original languageRussian
Pages (from-to)178-182
Journal ПРОЦЕССЫ УПРАВЛЕНИЯ И УСТОЙЧИВОСТЬ
Volume6
Issue number1
StatePublished - 2019
Externally publishedYes

    Research areas

  • bioinformatics, genetics, machine learning, molecular biology, биоинформатика, генетика, машинное обучение, молекулярная биология

ID: 78392898