This paper describes automatic topic spotting of literary texts based on the Russian short stories corpus, compiling stories written in the first third of the XXth century.
Non-negative matrix factorization (NMF) is a valuable alternative to existing approaches of dynamic topic modeling and it can find niche topics and related vocabularies that are not captured by existent methods. The experiments were conducted on text samples extracted from the corpus, the given samples contain texts of 300 different authors. This approach allows to trace the topic dynamics of Russian prose for 30 years — from 1900
to 1930.
Язык оригиналаанглийский
Название основной публикацииR. Piotrowski's Readings in Language Engineering and Applied Linguistics. PRLEAL-2019
Подзаголовок основной публикацииProceedings of the III International Conference
РедакторыAndrey Ronzhin, Tatiana Noskova, Alexey Karpov
ИздательRWTH Aahen University
Страницы321-339
Число страниц13
СостояниеОпубликовано - 2020
Событие3rd International Conference on R. Piotrowski's Readings in Language Engineering and Applied Linguistics, PRLEAL 2019 - Saint Petersburg, Российская Федерация
Продолжительность: 27 ноя 2019 → …

Серия публикаций

НазваниеCEUR Workshop Proceedings
Том2552
ISSN (печатное издание)1613-0073

конференция

конференция3rd International Conference on R. Piotrowski's Readings in Language Engineering and Applied Linguistics, PRLEAL 2019
Страна/TерриторияРоссийская Федерация
ГородSaint Petersburg
Период27/11/19 → …

ID: 51154101