Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
Normalization Issues in Digital Literary Studies: Spelling, Literary Themes and Biographical Description of Writers. / Sherstinova, Tatiana; Kirina, Margarita.
в: Communications in Computer and Information Science, Том 1503 CCIS, 01.01.2022, стр. 332-346.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - Normalization Issues in Digital Literary Studies: Spelling, Literary Themes and Biographical Description of Writers
AU - Sherstinova, Tatiana
AU - Kirina, Margarita
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Digital literary studies are a branch of digital humanities, which deals with national or world literatures. In this paper, we discuss normalization issues which are crucial for compiling eCulture resources, designed for cultural analytics, social and literary studies, as well as various aspects of digital humanities. One of such resources is the Corpus of Russian short stories of 1900–1930s with the detailed information about Russian writers of the epoch in concern intended for stylometric, linguistic and literary studies of Russian prose. We see our task to create a literary resource based on a system approach to the literature of a certain time period, which implies inclusion into consideration literary texts of the maximum number of writers, who created their works in the given period, both well-known and peripheral. The paper concerns the problem of data normalization, which is a necessary requirement for statistical processing of data of any kind. We describe how we deal with the problem of different spelling, how we normalize manual annotation of literary themes made by an expert and how we tackle the problem of standardization of biographical descriptions of authors. The obtained normalized data can be used for various kinds of research in the field of literary studies, digital humanities, computational linguistics, and cultural heritage studies.
AB - Digital literary studies are a branch of digital humanities, which deals with national or world literatures. In this paper, we discuss normalization issues which are crucial for compiling eCulture resources, designed for cultural analytics, social and literary studies, as well as various aspects of digital humanities. One of such resources is the Corpus of Russian short stories of 1900–1930s with the detailed information about Russian writers of the epoch in concern intended for stylometric, linguistic and literary studies of Russian prose. We see our task to create a literary resource based on a system approach to the literature of a certain time period, which implies inclusion into consideration literary texts of the maximum number of writers, who created their works in the given period, both well-known and peripheral. The paper concerns the problem of data normalization, which is a necessary requirement for statistical processing of data of any kind. We describe how we deal with the problem of different spelling, how we normalize manual annotation of literary themes made by an expert and how we tackle the problem of standardization of biographical descriptions of authors. The obtained normalized data can be used for various kinds of research in the field of literary studies, digital humanities, computational linguistics, and cultural heritage studies.
KW - Biographical descriptions of writers
KW - Cultural heritage
KW - Digital humanities
KW - Literary corpus
KW - Literature studies
KW - Normalization
KW - Russian literature
KW - Spelling
KW - Thematic annotation
UR - http://www.scopus.com/inward/record.url?scp=85124671055&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-93715-7_24
DO - 10.1007/978-3-030-93715-7_24
M3 - Article
AN - SCOPUS:85124671055
VL - 1503 CCIS
SP - 332
EP - 346
JO - Communications in Computer and Information Science
JF - Communications in Computer and Information Science
SN - 1865-0929
ER -
ID: 101663090