Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
Semantic feature aggregation for gender identification in Russian facebook. / Panicheva, Polina; Mirzagitova, Aliia; Ledovaya, Yanina.
Artificial Intelligence and Natural Language - 6th Conference, AINL 2017, Revised Selected Papers. Vol. 789 Springer Nature, 2018. p. 3-15 (Communications in Computer and Information Science; Vol. 789).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
}
TY - GEN
T1 - Semantic feature aggregation for gender identification in Russian facebook
AU - Panicheva, Polina
AU - Mirzagitova, Aliia
AU - Ledovaya, Yanina
N1 - Conference code: 6
PY - 2018
Y1 - 2018
N2 - The goal of the current work is to evaluate semantic feature aggregation techniques in a task of gender classification of public social media texts in Russian. We collect Facebook posts of Russian-speaking users and apply them as a dataset for two topic modelling techniques and a distributional clustering approach. The output of the algorithms is applied as a feature aggregation method in a task of gender classification based on a smaller Facebook sample. The classification performance of the best model is favorably compared against the lemmas baseline and the state-of-the-art results reported for a different genre or language. The resulting successful features are exemplified, and the difference between the three techniques in terms of classification performance and feature contents are discussed, with the best technique clearly outperforming the others.
AB - The goal of the current work is to evaluate semantic feature aggregation techniques in a task of gender classification of public social media texts in Russian. We collect Facebook posts of Russian-speaking users and apply them as a dataset for two topic modelling techniques and a distributional clustering approach. The output of the algorithms is applied as a feature aggregation method in a task of gender classification based on a smaller Facebook sample. The classification performance of the best model is favorably compared against the lemmas baseline and the state-of-the-art results reported for a different genre or language. The resulting successful features are exemplified, and the difference between the three techniques in terms of classification performance and feature contents are discussed, with the best technique clearly outperforming the others.
UR - http://www.scopus.com/inward/record.url?scp=85037546120&partnerID=8YFLogxK
UR - http://www.mendeley.com/research/semantic-feature-aggregation-gender-identification-russian-facebook
U2 - 10.1007/978-3-319-71746-3_1
DO - 10.1007/978-3-319-71746-3_1
M3 - Conference contribution
AN - SCOPUS:85037546120
SN - 9783319717456
VL - 789
T3 - Communications in Computer and Information Science
SP - 3
EP - 15
BT - Artificial Intelligence and Natural Language - 6th Conference, AINL 2017, Revised Selected Papers
PB - Springer Nature
T2 - Conference on Artificial Intelligence and Natural Language
Y2 - 19 September 2017 through 22 September 2017
ER -
ID: 13395534