Topic modeling for Twitter discussions: Model selection and quality assessment

Standard

Topic modeling for Twitter discussions: Model selection and quality assessment. / Bodrunova, S.S.; Blekanov, I.S.; Kukarkin, M.M.

6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019: Conference proceedings. Vol. 6 Sofia, Bulgaria : STEF92 Technology Ltd., 2019. p. 207-214.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Bodrunova, SS , Blekanov, IS & Kukarkin, MM 2019, Topic modeling for Twitter discussions: Model selection and quality assessment. in 6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019: Conference proceedings. vol. 6, STEF92 Technology Ltd., Sofia, Bulgaria, pp. 207-214, 6th SWS International Scientific Conference on Social Sciences 2019, Albena, Bulgaria, 26/08/19.

APA

Bodrunova, S. S., Blekanov, I. S., & Kukarkin, M. M. (2019). Topic modeling for Twitter discussions: Model selection and quality assessment. In 6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019: Conference proceedings (Vol. 6, pp. 207-214). STEF92 Technology Ltd..

Vancouver

Bodrunova SS , Blekanov IS , Kukarkin MM. Topic modeling for Twitter discussions: Model selection and quality assessment. In 6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019: Conference proceedings. Vol. 6. Sofia, Bulgaria: STEF92 Technology Ltd. 2019. p. 207-214

Author

Bodrunova, S.S. ; Blekanov, I.S. ; Kukarkin, M.M. / Topic modeling for Twitter discussions: Model selection and quality assessment. 6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019: Conference proceedings. Vol. 6 Sofia, Bulgaria : STEF92 Technology Ltd., 2019. pp. 207-214

BibTeX

@inproceedings{b8e3cda2d5a84b2496ec14d0bff7edef,

title = "Topic modeling for Twitter discussions: Model selection and quality assessment",

abstract = "Topic modeling is a method of automated definition of subtopics in a text corpus. Usage of topic modeling for short texts, e.g. tweets, is highly complicated due to their short length and grammatical restructuring, including broken word order, abbreviations, and contamination of different languages. In this paper, the authors use the BTM topic modelling algorithm (previously found to work best in comparison with two other topic models measured by automated coherence metrics Umass and NPMI) to test three topic quality metrics independent from topic coherence. Topic modelling is applied to three cases of ethnic conflict discussions on Twitter in three different main languages, namely the Charlie Hebdo shooting (France), the Ferguson unrest (the USA), and the anti-immigrant bashings in Biryulevo (Russia), thus combining a large multilingual, a large monolingual, and a mid-range monolingual type of discussion. We measure the quality of modeling by looking at topic interpretability, topic robustness, and topic saliency. The results of the experiment show that the three topic features may be interdependent (but not always are); the multilingual discussion performs better than the monolingual ones in terms of interdependence of the metrics and formation of ideal topics; and interpretability does not depend on multi-/monolingualism and the dataset volume.",

keywords = "Topic modelling, Twitter, QUALITY ASSESSMENT, BTM, HUMAN CODING, TOPIC COHERENCE, INTERPRETABILITY, TOPIC SALIENCY, TOPIC ROBUSTNESS",

author = "S.S. Bodrunova and I.S. Blekanov and M.M. Kukarkin",

year = "2019",

month = aug,

language = "English",

isbn = "978-619-7408-95-9",

volume = "6",

pages = "207--214",

booktitle = "6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019",

publisher = "STEF92 Technology Ltd.",

address = "Bulgaria",

note = "6th SWS International Scientific Conference on Social Sciences 2019, ISCSS 2019 ; Conference date: 26-08-2019 Through 01-09-2019",

url = "https://www.sgemsocial.org/",

}

RIS

TY - GEN

T1 - Topic modeling for Twitter discussions: Model selection and quality assessment

AU - Bodrunova, S.S.

AU - Blekanov, I.S.

AU - Kukarkin, M.M.

N1 - Conference code: 6

PY - 2019/8

Y1 - 2019/8

N2 - Topic modeling is a method of automated definition of subtopics in a text corpus. Usage of topic modeling for short texts, e.g. tweets, is highly complicated due to their short length and grammatical restructuring, including broken word order, abbreviations, and contamination of different languages. In this paper, the authors use the BTM topic modelling algorithm (previously found to work best in comparison with two other topic models measured by automated coherence metrics Umass and NPMI) to test three topic quality metrics independent from topic coherence. Topic modelling is applied to three cases of ethnic conflict discussions on Twitter in three different main languages, namely the Charlie Hebdo shooting (France), the Ferguson unrest (the USA), and the anti-immigrant bashings in Biryulevo (Russia), thus combining a large multilingual, a large monolingual, and a mid-range monolingual type of discussion. We measure the quality of modeling by looking at topic interpretability, topic robustness, and topic saliency. The results of the experiment show that the three topic features may be interdependent (but not always are); the multilingual discussion performs better than the monolingual ones in terms of interdependence of the metrics and formation of ideal topics; and interpretability does not depend on multi-/monolingualism and the dataset volume.

AB - Topic modeling is a method of automated definition of subtopics in a text corpus. Usage of topic modeling for short texts, e.g. tweets, is highly complicated due to their short length and grammatical restructuring, including broken word order, abbreviations, and contamination of different languages. In this paper, the authors use the BTM topic modelling algorithm (previously found to work best in comparison with two other topic models measured by automated coherence metrics Umass and NPMI) to test three topic quality metrics independent from topic coherence. Topic modelling is applied to three cases of ethnic conflict discussions on Twitter in three different main languages, namely the Charlie Hebdo shooting (France), the Ferguson unrest (the USA), and the anti-immigrant bashings in Biryulevo (Russia), thus combining a large multilingual, a large monolingual, and a mid-range monolingual type of discussion. We measure the quality of modeling by looking at topic interpretability, topic robustness, and topic saliency. The results of the experiment show that the three topic features may be interdependent (but not always are); the multilingual discussion performs better than the monolingual ones in terms of interdependence of the metrics and formation of ideal topics; and interpretability does not depend on multi-/monolingualism and the dataset volume.

KW - Topic modelling

KW - Twitter

KW - QUALITY ASSESSMENT

KW - BTM

KW - HUMAN CODING

KW - TOPIC COHERENCE

KW - INTERPRETABILITY

KW - TOPIC SALIENCY

KW - TOPIC ROBUSTNESS

UR - https://www.elibrary.ru/item.asp?id=42554344

M3 - Conference contribution

SN - 978-619-7408-95-9

VL - 6

SP - 207

EP - 214

BT - 6TH SWS INTERNATIONAL SCIENTIFIC CONFERENCES ON SOCIAL SCIENCES 2019

PB - STEF92 Technology Ltd.

CY - Sofia, Bulgaria

T2 - 6th SWS International Scientific Conference on Social Sciences 2019

Y2 - 26 August 2019 through 1 September 2019

ER -

ID: 49788241