DOI

The goal of the current work is to evaluate semantic feature aggregation techniques in a task of gender classification of public social media texts in Russian. We collect Facebook posts of Russian-speaking users and apply them as a dataset for two topic modelling techniques and a distributional clustering approach. The output of the algorithms is applied as a feature aggregation method in a task of gender classification based on a smaller Facebook sample. The classification performance of the best model is favorably compared against the lemmas baseline and the state-of-the-art results reported for a different genre or language. The resulting successful features are exemplified, and the difference between the three techniques in terms of classification performance and feature contents are discussed, with the best technique clearly outperforming the others.

Язык оригиналаанглийский
Название основной публикацииArtificial Intelligence and Natural Language - 6th Conference, AINL 2017, Revised Selected Papers
ИздательSpringer Nature
Страницы3-15
Число страниц13
Том789
ISBN (печатное издание)9783319717456
DOI
СостояниеОпубликовано - 2018
СобытиеConference on Artificial Intelligence and Natural Language - St. Petersburg, Российская Федерация
Продолжительность: 19 сен 201722 сен 2017
Номер конференции: 6
http://ainlconf.ru/2017

Серия публикаций

НазваниеCommunications in Computer and Information Science
Том789
ISSN (печатное издание)1865-0929

конференция

конференцияConference on Artificial Intelligence and Natural Language
Сокращенное названиеAINL 2017
Страна/TерриторияРоссийская Федерация
ГородSt. Petersburg
Период19/09/1722/09/17
Сайт в сети Internet

    Предметные области Scopus

  • Компьютерные науки (все)
  • Математика (все)

ID: 13395534