This paper is the first part of contextual predictability model investigation for Russian, it is focused on linguistic and psychology interpretation of models, features, metrics and sets of features. The aim of this paper is to identify the dependence of the implementation of contextual predictability procedures on the genre characteristics of the text (or text collection): scientific vs. fictional. We construct a model predicting text elements and designate its features for texts of different genres and domains. We analyze various methods for studying contextual predictability, carry out a computational experiment against scientific and fictional texts, and verify its results by the experiment with informants (cloze-tests) and word embeddings (word2vec CBOW model). As a result, text processing model is built. It is evaluated based on the selected contextual predictability features and experiments with informants.

Original languageEnglish
Title of host publicationMining Intelligence and Knowledge Exploration - 7th International Conference, MIKE 2019, Proceedings
EditorsP. B.R., Veena Thenkanidiyoor, Rajendra Prasath, Odelu Vanga
PublisherSpringer Nature
Chapter11
Pages104-119
Number of pages16
ISBN (Print)9783030661861
DOIs
StatePublished - 2020
Event7th International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2019 - Veling, India
Duration: 19 Dec 201922 Dec 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11987 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2019
Country/TerritoryIndia
CityVeling
Period19/12/1922/12/19

    Research areas

  • Cloze test, Conditional probability, Contextual predictability, Dice, Fiction texts, Informational entropy, Language model, Scientific corpora, Surprisal

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

ID: 73342010