Research output: Contribution to journal › Article › peer-review
Propaganda identification using topic modelling. / Yakunin , Kirill; Ionescu , George Mihail; Murzakhmetov , Sanzhar; Mussabayev , Rustam; Filatova , Olga; Mukhamediev , Ravil.
In: Procedia Computer Science, Vol. 178, 2020, p. 205-212.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Propaganda identification using topic modelling
AU - Yakunin , Kirill
AU - Ionescu , George Mihail
AU - Murzakhmetov , Sanzhar
AU - Mussabayev , Rustam
AU - Filatova , Olga
AU - Mukhamediev , Ravil
N1 - Yakunin K., Ionescu G.M., Murzakhmetov S., Mussabayev R., Filatova O., Mukhamediev R. Propaganda identification using topic modelling // Procedia Computer Science 178 (2020) 205–212
PY - 2020
Y1 - 2020
N2 - This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.
AB - This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.
KW - propaganda
KW - natural language processing
KW - topic modelling
KW - text classification
KW - mass media analysis
UR - https://www.sciencedirect.com/science/article/pii/S1877050920323966#!
M3 - Article
VL - 178
SP - 205
EP - 212
JO - Procedia Computer Science
JF - Procedia Computer Science
SN - 1877-0509
T2 - 9th International Young Scientists Conference in Computational Science
Y2 - 22 June 2020 through 27 June 2020
ER -
ID: 72704648