Standard

Propaganda identification using topic modelling. / Yakunin , Kirill; Ionescu , George Mihail; Murzakhmetov , Sanzhar; Mussabayev , Rustam; Filatova , Olga; Mukhamediev , Ravil.

In: Procedia Computer Science, Vol. 178, 2020, p. 205-212.

Research output: Contribution to journalArticlepeer-review

Harvard

Yakunin , K, Ionescu , GM, Murzakhmetov , S, Mussabayev , R, Filatova , O & Mukhamediev , R 2020, 'Propaganda identification using topic modelling', Procedia Computer Science, vol. 178, pp. 205-212.

APA

Yakunin , K., Ionescu , G. M., Murzakhmetov , S., Mussabayev , R., Filatova , O., & Mukhamediev , R. (2020). Propaganda identification using topic modelling. Procedia Computer Science, 178, 205-212.

Vancouver

Yakunin K, Ionescu GM, Murzakhmetov S, Mussabayev R, Filatova O, Mukhamediev R. Propaganda identification using topic modelling. Procedia Computer Science. 2020;178:205-212.

Author

Yakunin , Kirill ; Ionescu , George Mihail ; Murzakhmetov , Sanzhar ; Mussabayev , Rustam ; Filatova , Olga ; Mukhamediev , Ravil. / Propaganda identification using topic modelling. In: Procedia Computer Science. 2020 ; Vol. 178. pp. 205-212.

BibTeX

@article{8e7a2f2b19bd424a8c1a1ba5a3d940ea,
title = "Propaganda identification using topic modelling",
abstract = "This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.",
keywords = "propaganda, natural language processing, topic modelling, text classification, mass media analysis",
author = "Kirill Yakunin and Ionescu, {George Mihail} and Sanzhar Murzakhmetov and Rustam Mussabayev and Olga Filatova and Ravil Mukhamediev",
note = "Yakunin K., Ionescu G.M., Murzakhmetov S., Mussabayev R., Filatova O., Mukhamediev R. Propaganda identification using topic modelling // Procedia Computer Science 178 (2020) 205–212 ; 9th International Young Scientists Conference in Computational Science ; Conference date: 22-06-2020 Through 27-06-2020",
year = "2020",
language = "English",
volume = "178",
pages = "205--212",
journal = "Procedia Computer Science",
issn = "1877-0509",
publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - Propaganda identification using topic modelling

AU - Yakunin , Kirill

AU - Ionescu , George Mihail

AU - Murzakhmetov , Sanzhar

AU - Mussabayev , Rustam

AU - Filatova , Olga

AU - Mukhamediev , Ravil

N1 - Yakunin K., Ionescu G.M., Murzakhmetov S., Mussabayev R., Filatova O., Mukhamediev R. Propaganda identification using topic modelling // Procedia Computer Science 178 (2020) 205–212

PY - 2020

Y1 - 2020

N2 - This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.

AB - This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.

KW - propaganda

KW - natural language processing

KW - topic modelling

KW - text classification

KW - mass media analysis

UR - https://www.sciencedirect.com/science/article/pii/S1877050920323966#!

M3 - Article

VL - 178

SP - 205

EP - 212

JO - Procedia Computer Science

JF - Procedia Computer Science

SN - 1877-0509

T2 - 9th International Young Scientists Conference in Computational Science

Y2 - 22 June 2020 through 27 June 2020

ER -

ID: 72704648