Standard

Normalization Issues in Digital Literary Studies: Spelling, Literary Themes and Biographical Description of Writers. / Sherstinova, Tatiana; Kirina, Margarita.

в: Communications in Computer and Information Science, Том 1503 CCIS, 01.01.2022, стр. 332-346.

Результаты исследований: Научные публикации в периодических изданияхстатьяРецензирование

Harvard

APA

Vancouver

Author

Sherstinova, Tatiana ; Kirina, Margarita. / Normalization Issues in Digital Literary Studies: Spelling, Literary Themes and Biographical Description of Writers. в: Communications in Computer and Information Science. 2022 ; Том 1503 CCIS. стр. 332-346.

BibTeX

@article{62a6c9745a964304a606f9990312d97e,
title = "Normalization Issues in Digital Literary Studies: Spelling, Literary Themes and Biographical Description of Writers",
abstract = "Digital literary studies are a branch of digital humanities, which deals with national or world literatures. In this paper, we discuss normalization issues which are crucial for compiling eCulture resources, designed for cultural analytics, social and literary studies, as well as various aspects of digital humanities. One of such resources is the Corpus of Russian short stories of 1900–1930s with the detailed information about Russian writers of the epoch in concern intended for stylometric, linguistic and literary studies of Russian prose. We see our task to create a literary resource based on a system approach to the literature of a certain time period, which implies inclusion into consideration literary texts of the maximum number of writers, who created their works in the given period, both well-known and peripheral. The paper concerns the problem of data normalization, which is a necessary requirement for statistical processing of data of any kind. We describe how we deal with the problem of different spelling, how we normalize manual annotation of literary themes made by an expert and how we tackle the problem of standardization of biographical descriptions of authors. The obtained normalized data can be used for various kinds of research in the field of literary studies, digital humanities, computational linguistics, and cultural heritage studies.",
keywords = "Biographical descriptions of writers, Cultural heritage, Digital humanities, Literary corpus, Literature studies, Normalization, Russian literature, Spelling, Thematic annotation",
author = "Tatiana Sherstinova and Margarita Kirina",
year = "2022",
month = jan,
day = "1",
doi = "10.1007/978-3-030-93715-7_24",
language = "English",
volume = "1503 CCIS",
pages = "332--346",
journal = "Communications in Computer and Information Science",
issn = "1865-0929",
publisher = "Springer Nature",

}

RIS

TY - JOUR

T1 - Normalization Issues in Digital Literary Studies: Spelling, Literary Themes and Biographical Description of Writers

AU - Sherstinova, Tatiana

AU - Kirina, Margarita

PY - 2022/1/1

Y1 - 2022/1/1

N2 - Digital literary studies are a branch of digital humanities, which deals with national or world literatures. In this paper, we discuss normalization issues which are crucial for compiling eCulture resources, designed for cultural analytics, social and literary studies, as well as various aspects of digital humanities. One of such resources is the Corpus of Russian short stories of 1900–1930s with the detailed information about Russian writers of the epoch in concern intended for stylometric, linguistic and literary studies of Russian prose. We see our task to create a literary resource based on a system approach to the literature of a certain time period, which implies inclusion into consideration literary texts of the maximum number of writers, who created their works in the given period, both well-known and peripheral. The paper concerns the problem of data normalization, which is a necessary requirement for statistical processing of data of any kind. We describe how we deal with the problem of different spelling, how we normalize manual annotation of literary themes made by an expert and how we tackle the problem of standardization of biographical descriptions of authors. The obtained normalized data can be used for various kinds of research in the field of literary studies, digital humanities, computational linguistics, and cultural heritage studies.

AB - Digital literary studies are a branch of digital humanities, which deals with national or world literatures. In this paper, we discuss normalization issues which are crucial for compiling eCulture resources, designed for cultural analytics, social and literary studies, as well as various aspects of digital humanities. One of such resources is the Corpus of Russian short stories of 1900–1930s with the detailed information about Russian writers of the epoch in concern intended for stylometric, linguistic and literary studies of Russian prose. We see our task to create a literary resource based on a system approach to the literature of a certain time period, which implies inclusion into consideration literary texts of the maximum number of writers, who created their works in the given period, both well-known and peripheral. The paper concerns the problem of data normalization, which is a necessary requirement for statistical processing of data of any kind. We describe how we deal with the problem of different spelling, how we normalize manual annotation of literary themes made by an expert and how we tackle the problem of standardization of biographical descriptions of authors. The obtained normalized data can be used for various kinds of research in the field of literary studies, digital humanities, computational linguistics, and cultural heritage studies.

KW - Biographical descriptions of writers

KW - Cultural heritage

KW - Digital humanities

KW - Literary corpus

KW - Literature studies

KW - Normalization

KW - Russian literature

KW - Spelling

KW - Thematic annotation

UR - http://www.scopus.com/inward/record.url?scp=85124671055&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-93715-7_24

DO - 10.1007/978-3-030-93715-7_24

M3 - Article

AN - SCOPUS:85124671055

VL - 1503 CCIS

SP - 332

EP - 346

JO - Communications in Computer and Information Science

JF - Communications in Computer and Information Science

SN - 1865-0929

ER -

ID: 101663090