Propaganda identification using topic modelling

Kirill Yakunin , George Mihail Ionescu , Sanzhar Murzakhmetov , Rustam Mussabayev , Olga Filatova , Ravil Mukhamediev

Результат исследований: Научные публикации в периодических изданияхстатьярецензирование

1 Цитирования (Scopus)


This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.
Язык оригиналаанглийский
Страницы (с-по)205-212
ЖурналProcedia Computer Science
Ранняя дата в режиме онлайн7 дек 2020
СостояниеОпубликовано - 2020
Событие9th International Young Scientists Conference in Computational Science - Heraklion, Греция
Продолжительность: 22 июн 202027 июн 2020

Fingerprint Подробные сведения о темах исследования «Propaganda identification using topic modelling». Вместе они формируют уникальный семантический отпечаток (fingerprint).