Topic Modelling with NMF vs. Expert Topic Annotation

Standard

Topic Modelling with NMF vs. Expert Topic Annotation : The Case Study of Russian Fiction. / Sherstinova, Tatiana ; Mitrofanova, Olga ; Skrebtsova, Tatiana ; Zamiraylova, Ekaterina; Kirina, Margarita.

Advances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings. ed. / Lourdes Martínez-Villaseñor; Hiram Ponce; Oscar Herrera-Alcántara; Félix A. Castro-Espinoza. Springer Nature, 2020. p. 134-151 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12469 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Sherstinova, T , Mitrofanova, O , Skrebtsova, T , Zamiraylova, E & Kirina, M 2020, Topic Modelling with NMF vs. Expert Topic Annotation: The Case Study of Russian Fiction. in L Martínez-Villaseñor, H Ponce, O Herrera-Alcántara & FA Castro-Espinoza (eds), Advances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12469 LNAI, Springer Nature, pp. 134-151, 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Mexico City, Mexico, 12/10/20. https://doi.org/10.1007/978-3-030-60887-3_13

APA

Sherstinova, T., Mitrofanova, O., Skrebtsova, T., Zamiraylova, E., & Kirina, M. (2020). Topic Modelling with NMF vs. Expert Topic Annotation: The Case Study of Russian Fiction. In L. Martínez-Villaseñor, H. Ponce, O. Herrera-Alcántara, & F. A. Castro-Espinoza (Eds.), Advances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings (pp. 134-151). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12469 LNAI). Springer Nature. https://doi.org/10.1007/978-3-030-60887-3_13

Vancouver

Sherstinova T , Mitrofanova O , Skrebtsova T , Zamiraylova E, Kirina M. Topic Modelling with NMF vs. Expert Topic Annotation: The Case Study of Russian Fiction. In Martínez-Villaseñor L, Ponce H, Herrera-Alcántara O, Castro-Espinoza FA, editors, Advances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings. Springer Nature. 2020. p. 134-151. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-60887-3_13

Author

Sherstinova, Tatiana ; Mitrofanova, Olga ; Skrebtsova, Tatiana ; Zamiraylova, Ekaterina ; Kirina, Margarita. / Topic Modelling with NMF vs. Expert Topic Annotation : The Case Study of Russian Fiction. Advances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings. editor / Lourdes Martínez-Villaseñor ; Hiram Ponce ; Oscar Herrera-Alcántara ; Félix A. Castro-Espinoza. Springer Nature, 2020. pp. 134-151 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{7adf1190c44041c89c31f30f58b26b40,

title = "Topic Modelling with NMF vs. Expert Topic Annotation: The Case Study of Russian Fiction",

abstract = "The paper presents an experiment aimed at comparison of results of topic modelling via non-negative matrix factorization (NMF) with that of manual topic annotation performed by an expert. The experiment was conducted on the annotated corpus of Russian short stories of the initial three decades of the 20th century, which contains 310 stories with a total of 1000000 tokens written by 300 Russian writers. The annotation scheme used in topic annotation includes 89 topics, further this list was reduced down to 30 generalized ones, the most frequent of which turned out to be the following: death, relationships, love, social groups, social processes, family, money, human sins, nature, religion, and war. Then, the corpus divided into three consecutive time periods was subjected to NMF topic modelling which provided a model including 24 topics. The results of both topic annotations were compared and described. The paper discusses the main findings of the study and the difficulties of fiction topic modelling which should be taken into account. For example, experimental results showed that topic modelling via NMF should be primarily recommended for the revealing of topics referring to general background of literary texts (e.g., war, love, nature, family) rather than for detecting topics related with some critical events or relations between characters (e.g., death or relations). The comparison of human and automatic topic annotation seems an important step for the improvement of artificial technologies techniques related with NLP.",

keywords = "Corpus linguistics, Digital humanities, Fiction, Literary criticism, Machine learning, NMF, NPL, Russian literature, Topic modelling",

author = "Tatiana Sherstinova and Olga Mitrofanova and Tatiana Skrebtsova and Ekaterina Zamiraylova and Margarita Kirina",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature Switzerland AG.; 19th Mexican International Conference on Artificial Intelligence, MICAI 2020 ; Conference date: 12-10-2020 Through 17-10-2020",

year = "2020",

doi = "10.1007/978-3-030-60887-3_13",

language = "English",

isbn = "9783030608866",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Nature",

pages = "134--151",

editor = "Lourdes Mart{\'i}nez-Villase{\~n}or and Hiram Ponce and Oscar Herrera-Alc{\'a}ntara and Castro-Espinoza, {F{\'e}lix A.}",

booktitle = "Advances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings",

address = "Germany",

}

RIS

TY - GEN

T1 - Topic Modelling with NMF vs. Expert Topic Annotation

T2 - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020

AU - Sherstinova, Tatiana

AU - Mitrofanova, Olga

AU - Skrebtsova, Tatiana

AU - Zamiraylova, Ekaterina

AU - Kirina, Margarita

PY - 2020

Y1 - 2020

N2 - The paper presents an experiment aimed at comparison of results of topic modelling via non-negative matrix factorization (NMF) with that of manual topic annotation performed by an expert. The experiment was conducted on the annotated corpus of Russian short stories of the initial three decades of the 20th century, which contains 310 stories with a total of 1000000 tokens written by 300 Russian writers. The annotation scheme used in topic annotation includes 89 topics, further this list was reduced down to 30 generalized ones, the most frequent of which turned out to be the following: death, relationships, love, social groups, social processes, family, money, human sins, nature, religion, and war. Then, the corpus divided into three consecutive time periods was subjected to NMF topic modelling which provided a model including 24 topics. The results of both topic annotations were compared and described. The paper discusses the main findings of the study and the difficulties of fiction topic modelling which should be taken into account. For example, experimental results showed that topic modelling via NMF should be primarily recommended for the revealing of topics referring to general background of literary texts (e.g., war, love, nature, family) rather than for detecting topics related with some critical events or relations between characters (e.g., death or relations). The comparison of human and automatic topic annotation seems an important step for the improvement of artificial technologies techniques related with NLP.

AB - The paper presents an experiment aimed at comparison of results of topic modelling via non-negative matrix factorization (NMF) with that of manual topic annotation performed by an expert. The experiment was conducted on the annotated corpus of Russian short stories of the initial three decades of the 20th century, which contains 310 stories with a total of 1000000 tokens written by 300 Russian writers. The annotation scheme used in topic annotation includes 89 topics, further this list was reduced down to 30 generalized ones, the most frequent of which turned out to be the following: death, relationships, love, social groups, social processes, family, money, human sins, nature, religion, and war. Then, the corpus divided into three consecutive time periods was subjected to NMF topic modelling which provided a model including 24 topics. The results of both topic annotations were compared and described. The paper discusses the main findings of the study and the difficulties of fiction topic modelling which should be taken into account. For example, experimental results showed that topic modelling via NMF should be primarily recommended for the revealing of topics referring to general background of literary texts (e.g., war, love, nature, family) rather than for detecting topics related with some critical events or relations between characters (e.g., death or relations). The comparison of human and automatic topic annotation seems an important step for the improvement of artificial technologies techniques related with NLP.

KW - Corpus linguistics

KW - Digital humanities

KW - Fiction

KW - Literary criticism

KW - Machine learning

KW - NMF

KW - NPL

KW - Russian literature

KW - Topic modelling

UR - http://www.scopus.com/inward/record.url?scp=85092935662&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-60887-3_13

DO - 10.1007/978-3-030-60887-3_13

M3 - Conference contribution

AN - SCOPUS:85092935662

SN - 9783030608866

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 134

EP - 151

BT - Advances in Computational Intelligence - 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, Proceedings

A2 - Martínez-Villaseñor, Lourdes

A2 - Ponce, Hiram

A2 - Herrera-Alcántara, Oscar

A2 - Castro-Espinoza, Félix A.

PB - Springer Nature

Y2 - 12 October 2020 through 17 October 2020

ER -

ID: 98682414