Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
The goal of the current work is to evaluate semantic feature aggregation techniques in a task of gender classification of public social media texts in Russian. We collect Facebook posts of Russian-speaking users and apply them as a dataset for two topic modelling techniques and a distributional clustering approach. The output of the algorithms is applied as a feature aggregation method in a task of gender classification based on a smaller Facebook sample. The classification performance of the best model is favorably compared against the lemmas baseline and the state-of-the-art results reported for a different genre or language. The resulting successful features are exemplified, and the difference between the three techniques in terms of classification performance and feature contents are discussed, with the best technique clearly outperforming the others.
Original language | English |
---|---|
Title of host publication | Artificial Intelligence and Natural Language - 6th Conference, AINL 2017, Revised Selected Papers |
Publisher | Springer Nature |
Pages | 3-15 |
Number of pages | 13 |
Volume | 789 |
ISBN (Print) | 9783319717456 |
DOIs | |
State | Published - 2018 |
Event | Conference on Artificial Intelligence and Natural Language - St. Petersburg, Russian Federation Duration: 19 Sep 2017 → 22 Sep 2017 Conference number: 6 http://ainlconf.ru/2017 |
Name | Communications in Computer and Information Science |
---|---|
Volume | 789 |
ISSN (Print) | 1865-0929 |
Conference | Conference on Artificial Intelligence and Natural Language |
---|---|
Abbreviated title | AINL 2017 |
Country/Territory | Russian Federation |
City | St. Petersburg |
Period | 19/09/17 → 22/09/17 |
Internet address |
ID: 13395534