The paper discusses development of a corpus of Russian posts with hash tags based on Pikabu social network. We developed a balanced and representative corpus as regards the impact of certain authors, the amount and size of their posts. Our study is aimed at the development of probabilistic topic models revealing the authors’ interests and preferences, as well as correlation of topics within the corpus. We performed a series of experiments including standard LDA topic modelling and Author-Topic modelling. In course of topic modelling we used algorithms from Python libraries. Experiments allowed to extract groups of authors with similar and related interests. We used topic label assignment based on manually introduced hash tags and labels automatically extracted from the lexical database RuWordNet. That facilitates linguistic interpretation of results.

Original languageEnglish
Pages (from-to)101-116
Number of pages16
JournalCEUR Workshop Proceedings
Volume2813
StatePublished - 2021
EventInternet and Modern Society - Университет ИТМО, Санкт-Петербург, Russian Federation
Duration: 17 Jun 202020 Jun 2020
Conference number: 23
http://ims.ifmo.ru/ru/pages/2/programma.htm

    Research areas

  • ATM, LDA, Pikabu, Russian, Social networks, Topic label assignment, Topic modelling

    Scopus subject areas

  • Computer Science(all)

ID: 85926873