Topic modelling of the Russian corpus of Pikabu posts

The paper discusses development of a corpus of Russian posts with hash tags based on Pikabu social network. We developed a balanced and representative corpus as regards the impact of certain authors, the amount and size of their posts. Our study is aimed at the development of probabilistic topic models revealing the authors’ interests and preferences, as well as correlation of topics within the corpus. We performed a series of experiments including standard LDA topic modelling and Author-Topic modelling. In course of topic modelling we used algorithms from Python libraries. Experiments allowed to extract groups of authors with similar and related interests. We used topic label assignment based on manually introduced hash tags and labels automatically extracted from the lexical database RuWordNet. That facilitates linguistic interpretation of results.

Original language	English
Pages (from-to)	101-116
Number of pages	16
Journal	CEUR Workshop Proceedings
Volume	2813
State	Published - 2021
Event	Internet and Modern Society - Университет ИТМО, Санкт-Петербург, Russian Federation Duration: 17 Jun 2020 → 20 Jun 2020 Conference number: 23 http://ims.ifmo.ru/ru/pages/2/programma.htm

Research areas

ATM, LDA, Pikabu, Russian, Social networks, Topic label assignment, Topic modelling

Scopus subject areas

Computer Science(all)

ID: 85926873

Topic modelling of the Russian corpus of Pikabu posts: Author-topic distribution and topic labelling

Research areas

Scopus subject areas