This paper describes automatic topic spotting of literary texts based on the Russian short stories corpus, compiling stories written in the first third of the XXth century.
Non-negative matrix factorization (NMF) is a valuable alternative to existing approaches of dynamic topic modeling and it can find niche topics and related vocabularies that are not captured by existent methods. The experiments were conducted on text samples extracted from the corpus, the given samples contain texts of 300 different authors. This approach allows to trace the topic dynamics of Russian prose for 30 years — from 1900
to 1930.
Original languageEnglish
Title of host publicationR. Piotrowski's Readings in Language Engineering and Applied Linguistics. PRLEAL-2019
Subtitle of host publicationProceedings of the III International Conference
EditorsAndrey Ronzhin, Tatiana Noskova, Alexey Karpov
PublisherRWTH Aahen University
Pages321-339
Number of pages13
StatePublished - 2020
Event3rd International Conference on R. Piotrowski's Readings in Language Engineering and Applied Linguistics, PRLEAL 2019 - Saint Petersburg, Russian Federation
Duration: 27 Nov 2019 → …

Publication series

NameCEUR Workshop Proceedings
Volume2552
ISSN (Print)1613-0073

Conference

Conference3rd International Conference on R. Piotrowski's Readings in Language Engineering and Applied Linguistics, PRLEAL 2019
Country/TerritoryRussian Federation
CitySaint Petersburg
Period27/11/19 → …

    Research areas

  • Computational linguistics, dynamic topic modeling, Russian literature, Russian short stories

ID: 51154101