Research output: Contribution to journal › Conference article › peer-review
The paper discusses development of a corpus of Russian posts with hash tags based on Pikabu social network. We developed a balanced and representative corpus as regards the impact of certain authors, the amount and size of their posts. Our study is aimed at the development of probabilistic topic models revealing the authors’ interests and preferences, as well as correlation of topics within the corpus. We performed a series of experiments including standard LDA topic modelling and Author-Topic modelling. In course of topic modelling we used algorithms from Python libraries. Experiments allowed to extract groups of authors with similar and related interests. We used topic label assignment based on manually introduced hash tags and labels automatically extracted from the lexical database RuWordNet. That facilitates linguistic interpretation of results.
Original language | English |
---|---|
Pages (from-to) | 101-116 |
Number of pages | 16 |
Journal | CEUR Workshop Proceedings |
Volume | 2813 |
State | Published - 2021 |
Event | Internet and Modern Society - Университет ИТМО, Санкт-Петербург, Russian Federation Duration: 17 Jun 2020 → 20 Jun 2020 Conference number: 23 http://ims.ifmo.ru/ru/pages/2/programma.htm |
ID: 85926873