DOI

The article describes the results of the research, the purpose of which was to evaluate the influence of linguistic preprocessing on the interpretability of topic models for literary texts. The study was carried out as part of a large project aimed to obtain topic models of Russian short stories written in the first three decades of the 20th century and divided into three successive historical periods: 1) the period of the beginning of the century before the First World War (1900-1913), 2) the time of acute social cataclysms, wars and revolutions (World War I, the February and October revolutions, and the Civil War) (1914-1922), and 3) the early Soviet period (1923-1930). The material of the study was 3 samples of different sizes for each period, containing 100, 500 and 1000 short stories each. Preprocessing included lemmatization using spaCy library and four POS-filtering options: 1) nouns only, 2) nouns and verbs, 3) nouns, adjectives, adverbs, verbs, and 4) no filtering. Using the latent Dirichlet allocation (LDA), 36 topic models were built (9 models for each preprocessing option). The research showed that in case of literary texts topic models built without any POS filters are the most interpretable. The study made it possible to obtain information about topic diversity of Russian short stories, to assess their expert interpretability, and to offer some recommendations for optimizing topic modeling, which are to be used in the development of artificial intelligence systems that process large volumes of literary texts.
Язык оригиналаанглийский
Название основной публикации2022 31st Conference of Open Innovations Association (FRUCT)
Страницы305-312
Число страниц8
Том2022-April
DOI
СостояниеОпубликовано - 1 янв 2022
Событие2022 31st Conference of Open Innovations Association (FRUCT) -
Продолжительность: 27 апр 202229 апр 2022

Серия публикаций

НазваниеCONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT
ИздательFRUCT Oy
Том2022-April
ISSN (печатное издание)2305-7254

конференция

конференция2022 31st Conference of Open Innovations Association (FRUCT)
Период27/04/2229/04/22

ID: 101663042