Propaganda identification using topic modelling › Научные исследования в СПбГУ

Kirill Yakunin
George Mihail Ionescu
Sanzhar Murzakhmetov
Rustam Mussabayev
Olga Filatova
Ravil Mukhamediev

This paper presents a method based on topic modelling for identifying texts with propagandistic content. The method is an attempt to incorporate transfer learning idea of obtaining effective vector representation from a large unlabeled or (semi-) automatically labelled dataset, while also attempting to minimize the amount of necessary manual expert labelling by introducing high level labelling (either manual or automatic) on some explicit document property. The proposed method includes four key stages: formation of corpus partitioning, computing a topic model of a united corpus, calculation of corpora imbalance estimates of each topic; extrapolating the results of the imbalance estimation on all documents. The method was cross-validated on a labelled subsample of 1000 news, and achieves high predictive power – ROC AUC 0.73.

Язык оригинала	английский
Страницы (с-по)	205-212
Журнал	Procedia Computer Science
Том	178
Дата раннего онлайн-доступа	7 дек 2020
Состояние	Опубликовано - 2020
Событие	9th International Young Scientists Conference in Computational Science - Heraklion, Греция Продолжительность: 22 июн 2020 → 27 июн 2020

ID: 72704648

Pure – это продукт компании Elsevier
На данном информационном ресурсе могут быть опубликованы архивные материалы
с упоминанием физических и юридических лиц, включенных Министерством юстиции
Российской Федерации в реестр иностранных агентов

Вход в Pure