The goal of the current work is to evaluate semantic feature aggregation techniques in a task of gender classification of public social media texts in Russian. We collect Facebook posts of Russian-speaking users and apply them as a dataset for two topic modelling techniques and a distributional clustering approach. The output of the algorithms is applied as a feature aggregation method in a task of gender classification based on a smaller Facebook sample. The classification performance of the best model is favorably compared against the lemmas baseline and the state-of-the-art results reported for a different genre or language. The resulting successful features are exemplified, and the difference between the three techniques in terms of classification performance and feature contents are discussed, with the best technique clearly outperforming the others.

Original languageEnglish
Title of host publicationArtificial Intelligence and Natural Language - 6th Conference, AINL 2017, Revised Selected Papers
PublisherSpringer Nature
Pages3-15
Number of pages13
Volume789
ISBN (Print)9783319717456
DOIs
StatePublished - 2018
EventConference on Artificial Intelligence and Natural Language - St. Petersburg, Russian Federation
Duration: 19 Sep 201722 Sep 2017
Conference number: 6
http://ainlconf.ru/2017

Publication series

NameCommunications in Computer and Information Science
Volume789
ISSN (Print)1865-0929

Conference

ConferenceConference on Artificial Intelligence and Natural Language
Abbreviated titleAINL 2017
Country/TerritoryRussian Federation
CitySt. Petersburg
Period19/09/1722/09/17
Internet address

    Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

ID: 13395534