Topic modelling with NMF vs. expert topic annotation: the case study of Russian fiction

Переведенное название: Моделирование темы при помощи NMF и тематическое аннотирование экспертом (на примере русской художественной литературы)

Татьяна Юрьевна Шерстинова, Ольга Александровна Митрофанова, Татьяна Георгиевна Скребцова, Екатерина Владимировна Замирайлова, Маргарита Кирина

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаярецензирование

1 Цитирования (Scopus)


The paper presents an experiment aimed at comparison of results of topic modelling via non-negative matrix factorization (NMF) with that of manual topic annotation performed by an expert. The experiment was conducted on the annotated corpus of Russian short stories of the initial three decades of the 20th century, which contains 310 stories with a total of 1000000 tokens written by 300 Russian writers. The annotation scheme used in topic annotation includes 89 topics, further this list was reduced down to 30 generalized ones, the most frequent of which turned out to be the following: death, relationships, love, social groups, social processes, family, money, human sins, nature, religion, and war. Then, the corpus divided into three consecutive time periods was subjected to NMF topic modelling which provided a model including 24 topics. The results of both topic annotations were compared and described. The paper discusses the main findings of the study and the difficulties of fiction topic modelling which should be taken into account. For example, experimental results showed that topic modelling via NMF should be primarily recommended for the revealing of topics referring to general background of literary texts (e.g., war, love, nature, family) rather than for detecting topics related with some critical events or relations between characters (e.g., death or relations). The comparison of human and automatic topic annotation seems an important step for the improvement of artificial technologies techniques related with NLP.

Переведенное названиеМоделирование темы при помощи NMF и тематическое аннотирование экспертом (на примере русской художественной литературы)
Язык оригиналаанглийский
Название основной публикацииAdvances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings
РедакторыLourdes Martínez-Villaseñor, Hiram Ponce, Oscar Herrera-Alcántara, Félix A. Castro-Espinoza
Место публикацииCham
ИздательSpringer Nature
Число страниц18
ISBN (печатное издание)9783030608866
СостояниеОпубликовано - 2020

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том12469 LNAI
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)


Подробные сведения о темах исследования «Моделирование темы при помощи NMF и тематическое аннотирование экспертом (на примере русской художественной литературы)». Вместе они формируют уникальный семантический отпечаток (fingerprint).