Topic modeling for Twitter discussions: Model selection and quality assessment

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциирецензирование

Аннотация

Topic modeling is a method of automated definition of subtopics in a text corpus. Usage of topic modeling for short texts, e.g. tweets, is highly complicated due to their short length and grammatical restructuring, including broken word order, abbreviations, and contamination of different languages. In this paper, the authors use the BTM topic modelling algorithm (previously found to work best in comparison with two other topic models measured by automated coherence metrics Umass and NPMI) to test three topic quality metrics independent from topic coherence. Topic modelling is applied to three cases of ethnic conflict discussions on Twitter in three different main languages, namely the Charlie Hebdo shooting (France), the Ferguson unrest (the USA), and the anti-immigrant bashings in Biryulevo (Russia), thus combining a large multilingual, a large monolingual, and a mid-range monolingual type of discussion. We measure the quality of modeling by looking at topic interpretability, topic robustness, and topic saliency. The results of the experiment show that the three topic features may be interdependent (but not always are); the multilingual discussion performs better than the monolingual ones in terms of interdependence of the metrics and formation of ideal topics; and interpretability does not depend on multi-/monolingualism and the dataset volume.
Язык оригиналаанглийский
Название основной публикации6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019
Подзаголовок основной публикацииConference proceedings
Место публикацииSofia, Bulgaria
ИздательSTEF92 Technology Ltd.
Страницы207-214
Число страниц8
Том6
ISBN (печатное издание)978-619-7408-95-9
СостояниеОпубликовано - авг 2019
Событие6th SWS International Scientific Conference on Social Sciences 2019 - Paradise Blue 5 *****, Congress Center, Albena, Болгария
Продолжительность: 26 авг 20191 сен 2019
Номер конференции: 6
https://www.sgemsocial.org/

конференция

конференция6th SWS International Scientific Conference on Social Sciences 2019
Сокращенный заголовокSWS2019-SGEM2019
СтранаБолгария
ГородAlbena
Период26/08/191/09/19
Адрес в сети Интернет

Fingerprint

Подробные сведения о темах исследования «Topic modeling for Twitter discussions: Model selection and quality assessment». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать