The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts

Standard

The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts. / Blekanov, Ivan S.; Bodrunova, Svetlana S.; Zhuravleva, Nina ; Smoliarova, Anna ; Tarasov, Nikita.

Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings. ed. / Gabriele Meiselwitz. Cham : Springer Nature, 2020. p. 19-26 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12194 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Blekanov, IS , Bodrunova, SS , Zhuravleva, N , Smoliarova, A & Tarasov, N 2020, The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts. in G Meiselwitz (ed.), Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12194 LNCS, Springer Nature, Cham, pp. 19-26, 12th International Conference on Social Computing and Social Media, SCSM 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020, Copenhagen, Denmark, 19/07/20. https://doi.org/10.1007/978-3-030-49570-1_2

APA

Blekanov, I. S., Bodrunova, S. S., Zhuravleva, N., Smoliarova, A., & Tarasov, N. (2020). The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts. In G. Meiselwitz (Ed.), Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings (pp. 19-26). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12194 LNCS). Springer Nature. https://doi.org/10.1007/978-3-030-49570-1_2

Vancouver

Blekanov IS , Bodrunova SS , Zhuravleva N , Smoliarova A , Tarasov N. The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts. In Meiselwitz G, editor, Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings. Cham: Springer Nature. 2020. p. 19-26. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-49570-1_2

Author

Blekanov, Ivan S. ; Bodrunova, Svetlana S. ; Zhuravleva, Nina ; Smoliarova, Anna ; Tarasov, Nikita. / The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts. Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings. editor / Gabriele Meiselwitz. Cham : Springer Nature, 2020. pp. 19-26 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{852b68839a934765a0c2e9101bed50dd,

title = "The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts",

abstract = "Background. Topic modelling is a method of automated probabilistic detection of topics in a text collection. Use of topic modelling for short texts, e.g. tweets or search engine queries, is complicated due to their short length and grammatical flaws, including broken word order, abbreviations, and contamination of different languages. At the same time, as our research shows, human coding cannot be perceived as a baseline for topic quality assessment. Objectives. We use biterm topic model (BTM) to test the relations between two topic quality metrics independent from topic coherence with the human topic interpretability. Topic modelling is applied to three cases of conflictual Twitter discussions in three different languages, namely the Charlie Hebdo shooting (France), the Ferguson unrest (the USA), and the anti-immigrant bashings in Biryulevo (Russia), which represent, respectively, a global multilingual, a large monolingual, and a mid-range monolingual type of discussions. Method. First, we evaluate the human baseline coding by providing evidence for the Russian case on the coding by two pairs of coders who have varying levels of knowledge of the case. We then measure the quality of modelling on the level of topics by looking at topic interpretability (by experienced coders), topic robustness, and topic saliency. Results. The results of the experiment show that: 1) the idea of human coding as baseline needs to be rejected; 2) topic interpretability, robustness, and saliency can be inter-related; 3) the multilingual discussion performs better than the monolingual ones in terms of interdependence of the metrics. Conclusion. We formulate the idea of an {\textquoteleft}ideal topic{\textquoteright} that rethinks the goal of topic modelling towards finding a smaller number of good topics rather instead of maximization of the number of interpretable topics.",

keywords = "Human coding, Ideal topic, Inter-ethnic discussions, Topic modelling, Topic quality, Twitter",

author = "Blekanov, {Ivan S.} and Bodrunova, {Svetlana S.} and Nina Zhuravleva and Anna Smoliarova and Nikita Tarasov",

note = "Blekanov I.S., Bodrunova S.S., Zhuravleva N., Smoliarova A., Tarasov N. (2020) The Ideal Topic: Interdependence of Topic Interpretability and Other Quality Features in Topic Modelling for Short Texts. In: Meiselwitz G. (eds) Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis. HCII 2020. Lecture Notes in Computer Science, vol 12194. Springer, Cham. https://doi.org/10.1007/978-3-030-49570-1_2; 12th International Conference on Social Computing and Social Media, SCSM 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020 ; Conference date: 19-07-2020 Through 24-07-2020",

year = "2020",

doi = "10.1007/978-3-030-49570-1_2",

language = "English",

isbn = "9783030495695",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Nature",

pages = "19--26",

editor = "Gabriele Meiselwitz",

booktitle = "Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings",

address = "Germany",

}

RIS

TY - GEN

T1 - The ideal topic: interdependence of topic interpretability and other quality features in topic modelling for short texts

AU - Blekanov, Ivan S.

AU - Bodrunova, Svetlana S.

AU - Zhuravleva, Nina

AU - Smoliarova, Anna

AU - Tarasov, Nikita

N1 - Blekanov I.S., Bodrunova S.S., Zhuravleva N., Smoliarova A., Tarasov N. (2020) The Ideal Topic: Interdependence of Topic Interpretability and Other Quality Features in Topic Modelling for Short Texts. In: Meiselwitz G. (eds) Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis. HCII 2020. Lecture Notes in Computer Science, vol 12194. Springer, Cham. https://doi.org/10.1007/978-3-030-49570-1_2

PY - 2020

Y1 - 2020

N2 - Background. Topic modelling is a method of automated probabilistic detection of topics in a text collection. Use of topic modelling for short texts, e.g. tweets or search engine queries, is complicated due to their short length and grammatical flaws, including broken word order, abbreviations, and contamination of different languages. At the same time, as our research shows, human coding cannot be perceived as a baseline for topic quality assessment. Objectives. We use biterm topic model (BTM) to test the relations between two topic quality metrics independent from topic coherence with the human topic interpretability. Topic modelling is applied to three cases of conflictual Twitter discussions in three different languages, namely the Charlie Hebdo shooting (France), the Ferguson unrest (the USA), and the anti-immigrant bashings in Biryulevo (Russia), which represent, respectively, a global multilingual, a large monolingual, and a mid-range monolingual type of discussions. Method. First, we evaluate the human baseline coding by providing evidence for the Russian case on the coding by two pairs of coders who have varying levels of knowledge of the case. We then measure the quality of modelling on the level of topics by looking at topic interpretability (by experienced coders), topic robustness, and topic saliency. Results. The results of the experiment show that: 1) the idea of human coding as baseline needs to be rejected; 2) topic interpretability, robustness, and saliency can be inter-related; 3) the multilingual discussion performs better than the monolingual ones in terms of interdependence of the metrics. Conclusion. We formulate the idea of an ‘ideal topic’ that rethinks the goal of topic modelling towards finding a smaller number of good topics rather instead of maximization of the number of interpretable topics.

AB - Background. Topic modelling is a method of automated probabilistic detection of topics in a text collection. Use of topic modelling for short texts, e.g. tweets or search engine queries, is complicated due to their short length and grammatical flaws, including broken word order, abbreviations, and contamination of different languages. At the same time, as our research shows, human coding cannot be perceived as a baseline for topic quality assessment. Objectives. We use biterm topic model (BTM) to test the relations between two topic quality metrics independent from topic coherence with the human topic interpretability. Topic modelling is applied to three cases of conflictual Twitter discussions in three different languages, namely the Charlie Hebdo shooting (France), the Ferguson unrest (the USA), and the anti-immigrant bashings in Biryulevo (Russia), which represent, respectively, a global multilingual, a large monolingual, and a mid-range monolingual type of discussions. Method. First, we evaluate the human baseline coding by providing evidence for the Russian case on the coding by two pairs of coders who have varying levels of knowledge of the case. We then measure the quality of modelling on the level of topics by looking at topic interpretability (by experienced coders), topic robustness, and topic saliency. Results. The results of the experiment show that: 1) the idea of human coding as baseline needs to be rejected; 2) topic interpretability, robustness, and saliency can be inter-related; 3) the multilingual discussion performs better than the monolingual ones in terms of interdependence of the metrics. Conclusion. We formulate the idea of an ‘ideal topic’ that rethinks the goal of topic modelling towards finding a smaller number of good topics rather instead of maximization of the number of interpretable topics.

KW - Human coding

KW - Ideal topic

KW - Inter-ethnic discussions

KW - Topic modelling

KW - Topic quality

KW - Twitter

UR - http://www.scopus.com/inward/record.url?scp=85088523328&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-49570-1_2

DO - 10.1007/978-3-030-49570-1_2

M3 - Conference contribution

AN - SCOPUS:85088523328

SN - 9783030495695

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 19

EP - 26

BT - Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis - 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Proceedings

A2 - Meiselwitz, Gabriele

PB - Springer Nature

CY - Cham

T2 - 12th International Conference on Social Computing and Social Media, SCSM 2020, held as part of the 22nd International Conference on Human-Computer Interaction, HCII 2020

Y2 - 19 July 2020 through 24 July 2020

ER -

ID: 62124434