The paper discusses development of a corpus of Russian posts with hash tags based on Pikabu social network. We developed a balanced and representative corpus as regards the impact of certain authors, the amount and size of their posts. Our study is aimed at the development of probabilistic topic models revealing the authors’ interests and preferences, as well as correlation of topics within the corpus. We performed a series of experiments including standard LDA topic modelling and Author-Topic modelling. In course of topic modelling we used algorithms from Python libraries. Experiments allowed to extract groups of authors with similar and related interests. We used topic label assignment based on manually introduced hash tags and labels automatically extracted from the lexical database RuWordNet. That facilitates linguistic interpretation of results.

Язык оригиналаанглийский
Страницы (с-по)101-116
Число страниц16
ЖурналCEUR Workshop Proceedings
Том2813
СостояниеОпубликовано - 2021
СобытиеXXIII Объединенная научная конференция «Интернет и современное общество»
- Университет ИТМО, Санкт-Петербург, Российская Федерация
Продолжительность: 17 июн 202020 июн 2020
Номер конференции: 23
http://ims.ifmo.ru/ru/pages/2/programma.htm

    Предметные области Scopus

  • Компьютерные науки (все)

ID: 85926873