Standard

Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability. / Sherstinova, Tatiana; Moskvina, Anna; Kirina, Margarita; Zavyalova, Irina; Karysheva, Asya; Kolpashchikova, Evgenia; Maksimenko, Polina; Moskalenko, Alena.

2022 31st Conference of Open Innovations Association (FRUCT). Том 2022-April 2022. стр. 305-312 (CONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT; Том 2022-April).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаяРецензирование

Harvard

Sherstinova, T, Moskvina, A, Kirina, M, Zavyalova, I, Karysheva, A, Kolpashchikova, E, Maksimenko, P & Moskalenko, A 2022, Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability. в 2022 31st Conference of Open Innovations Association (FRUCT). Том. 2022-April, CONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT, Том. 2022-April, стр. 305-312, 2022 31st Conference of Open Innovations Association (FRUCT), 27/04/22. https://doi.org/10.23919/FRUCT54823.2022.9770887

APA

Sherstinova, T., Moskvina, A., Kirina, M., Zavyalova, I., Karysheva, A., Kolpashchikova, E., Maksimenko, P., & Moskalenko, A. (2022). Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability. в 2022 31st Conference of Open Innovations Association (FRUCT) (Том 2022-April, стр. 305-312). (CONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT; Том 2022-April). https://doi.org/10.23919/FRUCT54823.2022.9770887

Vancouver

Sherstinova T, Moskvina A, Kirina M, Zavyalova I, Karysheva A, Kolpashchikova E и пр. Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability. в 2022 31st Conference of Open Innovations Association (FRUCT). Том 2022-April. 2022. стр. 305-312. (CONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT). https://doi.org/10.23919/FRUCT54823.2022.9770887

Author

Sherstinova, Tatiana ; Moskvina, Anna ; Kirina, Margarita ; Zavyalova, Irina ; Karysheva, Asya ; Kolpashchikova, Evgenia ; Maksimenko, Polina ; Moskalenko, Alena. / Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability. 2022 31st Conference of Open Innovations Association (FRUCT). Том 2022-April 2022. стр. 305-312 (CONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT).

BibTeX

@inproceedings{121c77bbb0074d72b3c7af26b2c5e66d,
title = "Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability",
abstract = "The article describes the results of the research, the purpose of which was to evaluate the influence of linguistic preprocessing on the interpretability of topic models for literary texts. The study was carried out as part of a large project aimed to obtain topic models of Russian short stories written in the first three decades of the 20th century and divided into three successive historical periods: 1) the period of the beginning of the century before the First World War (1900-1913), 2) the time of acute social cataclysms, wars and revolutions (World War I, the February and October revolutions, and the Civil War) (1914-1922), and 3) the early Soviet period (1923-1930). The material of the study was 3 samples of different sizes for each period, containing 100, 500 and 1000 short stories each. Preprocessing included lemmatization using spaCy library and four POS-filtering options: 1) nouns only, 2) nouns and verbs, 3) nouns, adjectives, adverbs, verbs, and 4) no filtering. Using the latent Dirichlet allocation (LDA), 36 topic models were built (9 models for each preprocessing option). The research showed that in case of literary texts topic models built without any POS filters are the most interpretable. The study made it possible to obtain information about topic diversity of Russian short stories, to assess their expert interpretability, and to offer some recommendations for optimizing topic modeling, which are to be used in the development of artificial intelligence systems that process large volumes of literary texts.",
author = "Tatiana Sherstinova and Anna Moskvina and Margarita Kirina and Irina Zavyalova and Asya Karysheva and Evgenia Kolpashchikova and Polina Maksimenko and Alena Moskalenko",
year = "2022",
month = jan,
day = "1",
doi = "10.23919/FRUCT54823.2022.9770887",
language = "English",
volume = "2022-April",
series = "CONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT",
publisher = "FRUCT Oy",
pages = "305--312",
booktitle = "2022 31st Conference of Open Innovations Association (FRUCT)",
note = "null ; Conference date: 27-04-2022 Through 29-04-2022",

}

RIS

TY - GEN

T1 - Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability

AU - Sherstinova, Tatiana

AU - Moskvina, Anna

AU - Kirina, Margarita

AU - Zavyalova, Irina

AU - Karysheva, Asya

AU - Kolpashchikova, Evgenia

AU - Maksimenko, Polina

AU - Moskalenko, Alena

PY - 2022/1/1

Y1 - 2022/1/1

N2 - The article describes the results of the research, the purpose of which was to evaluate the influence of linguistic preprocessing on the interpretability of topic models for literary texts. The study was carried out as part of a large project aimed to obtain topic models of Russian short stories written in the first three decades of the 20th century and divided into three successive historical periods: 1) the period of the beginning of the century before the First World War (1900-1913), 2) the time of acute social cataclysms, wars and revolutions (World War I, the February and October revolutions, and the Civil War) (1914-1922), and 3) the early Soviet period (1923-1930). The material of the study was 3 samples of different sizes for each period, containing 100, 500 and 1000 short stories each. Preprocessing included lemmatization using spaCy library and four POS-filtering options: 1) nouns only, 2) nouns and verbs, 3) nouns, adjectives, adverbs, verbs, and 4) no filtering. Using the latent Dirichlet allocation (LDA), 36 topic models were built (9 models for each preprocessing option). The research showed that in case of literary texts topic models built without any POS filters are the most interpretable. The study made it possible to obtain information about topic diversity of Russian short stories, to assess their expert interpretability, and to offer some recommendations for optimizing topic modeling, which are to be used in the development of artificial intelligence systems that process large volumes of literary texts.

AB - The article describes the results of the research, the purpose of which was to evaluate the influence of linguistic preprocessing on the interpretability of topic models for literary texts. The study was carried out as part of a large project aimed to obtain topic models of Russian short stories written in the first three decades of the 20th century and divided into three successive historical periods: 1) the period of the beginning of the century before the First World War (1900-1913), 2) the time of acute social cataclysms, wars and revolutions (World War I, the February and October revolutions, and the Civil War) (1914-1922), and 3) the early Soviet period (1923-1930). The material of the study was 3 samples of different sizes for each period, containing 100, 500 and 1000 short stories each. Preprocessing included lemmatization using spaCy library and four POS-filtering options: 1) nouns only, 2) nouns and verbs, 3) nouns, adjectives, adverbs, verbs, and 4) no filtering. Using the latent Dirichlet allocation (LDA), 36 topic models were built (9 models for each preprocessing option). The research showed that in case of literary texts topic models built without any POS filters are the most interpretable. The study made it possible to obtain information about topic diversity of Russian short stories, to assess their expert interpretability, and to offer some recommendations for optimizing topic modeling, which are to be used in the development of artificial intelligence systems that process large volumes of literary texts.

UR - http://www.scopus.com/inward/record.url?scp=85130387690&partnerID=8YFLogxK

U2 - 10.23919/FRUCT54823.2022.9770887

DO - 10.23919/FRUCT54823.2022.9770887

M3 - Conference contribution

AN - SCOPUS:85130387690

VL - 2022-April

T3 - CONFERENCE OF OPEN INNOVATIONS ASSOCIATION, FRUCT

SP - 305

EP - 312

BT - 2022 31st Conference of Open Innovations Association (FRUCT)

Y2 - 27 April 2022 through 29 April 2022

ER -

ID: 101663042