RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives

Standard

RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives. / Bukreeva, Liudmila ; Guseva, Daria ; Dolgushin, Mikhail ; Evdokimova, Vera; Obotnina, Vasilisa.

Literature, Language and Computing: Russian Contribution from the LiLaC-2023. ed. / Polina Eismont; Maria Khokhlova; Mikhail Koryshev. Springer Nature, 2025. p. 183-191.

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

Harvard

Bukreeva, L , Guseva, D , Dolgushin, M , Evdokimova, V & Obotnina, V 2025, RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives. in P Eismont, M Khokhlova & M Koryshev (eds), Literature, Language and Computing: Russian Contribution from the LiLaC-2023. Springer Nature, pp. 183-191. https://doi.org/10.1007/978-981-96-0990-1_16

APA

Bukreeva, L., Guseva, D., Dolgushin, M., Evdokimova, V., & Obotnina, V. (2025). RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives. In P. Eismont, M. Khokhlova, & M. Koryshev (Eds.), Literature, Language and Computing: Russian Contribution from the LiLaC-2023 (pp. 183-191). Springer Nature. https://doi.org/10.1007/978-981-96-0990-1_16

Vancouver

Bukreeva L , Guseva D , Dolgushin M , Evdokimova V, Obotnina V. RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives. In Eismont P, Khokhlova M, Koryshev M, editors, Literature, Language and Computing: Russian Contribution from the LiLaC-2023. Springer Nature. 2025. p. 183-191 https://doi.org/10.1007/978-981-96-0990-1_16

Author

Bukreeva, Liudmila ; Guseva, Daria ; Dolgushin, Mikhail ; Evdokimova, Vera ; Obotnina, Vasilisa. / RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives. Literature, Language and Computing: Russian Contribution from the LiLaC-2023. editor / Polina Eismont ; Maria Khokhlova ; Mikhail Koryshev. Springer Nature, 2025. pp. 183-191

BibTeX

@inbook{e20d3556a7a548e49b50769657bf8155,

title = "RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives",

abstract = "The work described in this paper is aimed to enable better analysis of oral history archives. Our objective was to turn many facts and stories from Holocaust survivors into more accessible forms. We focused on Russian oral history archives, namely on the spoken interviews collected by the Yad Vashem Foundation, transcribed and summarized these valuable data sources automatically as well as manually by experts.We created the new Russian question–answer corpus (ruOHQA) that represents a labeled data-collection of oral history archives containing over 1,555 entries. Structure of SQuAD was used as a base approach for data organization. This paper discusses the detailed creation process, linguistic characteristics, strengths, and weaknesses of the corpus. We compare the ruOHQA with the SberQuAD dataset in order to clearly demonstrate our contributions and the potential for further research in this area. Particular attention was paid to the potential of the corpus for training neural network models. Hence, we present the annotated task-oriented corpora of Holocaust testimonies in Russian.",

keywords = "Corpora, Question answering, Visual history archives",

author = "Liudmila Bukreeva and Daria Guseva and Mikhail Dolgushin and Vera Evdokimova and Vasilisa Obotnina",

year = "2025",

month = mar,

day = "27",

doi = "10.1007/978-981-96-0990-1_16",

language = "English",

isbn = "978-981-96-0989-5",

pages = "183--191",

editor = "Polina Eismont and Maria Khokhlova and Mikhail Koryshev",

booktitle = "Literature, Language and Computing",

publisher = "Springer Nature",

address = "Germany",

}

RIS

TY - CHAP

T1 - RuOHQA: Creating QA Corpus in Russian Based on Oral History Archives

AU - Bukreeva, Liudmila

AU - Guseva, Daria

AU - Dolgushin, Mikhail

AU - Evdokimova, Vera

AU - Obotnina, Vasilisa

PY - 2025/3/27

Y1 - 2025/3/27

N2 - The work described in this paper is aimed to enable better analysis of oral history archives. Our objective was to turn many facts and stories from Holocaust survivors into more accessible forms. We focused on Russian oral history archives, namely on the spoken interviews collected by the Yad Vashem Foundation, transcribed and summarized these valuable data sources automatically as well as manually by experts.We created the new Russian question–answer corpus (ruOHQA) that represents a labeled data-collection of oral history archives containing over 1,555 entries. Structure of SQuAD was used as a base approach for data organization. This paper discusses the detailed creation process, linguistic characteristics, strengths, and weaknesses of the corpus. We compare the ruOHQA with the SberQuAD dataset in order to clearly demonstrate our contributions and the potential for further research in this area. Particular attention was paid to the potential of the corpus for training neural network models. Hence, we present the annotated task-oriented corpora of Holocaust testimonies in Russian.

AB - The work described in this paper is aimed to enable better analysis of oral history archives. Our objective was to turn many facts and stories from Holocaust survivors into more accessible forms. We focused on Russian oral history archives, namely on the spoken interviews collected by the Yad Vashem Foundation, transcribed and summarized these valuable data sources automatically as well as manually by experts.We created the new Russian question–answer corpus (ruOHQA) that represents a labeled data-collection of oral history archives containing over 1,555 entries. Structure of SQuAD was used as a base approach for data organization. This paper discusses the detailed creation process, linguistic characteristics, strengths, and weaknesses of the corpus. We compare the ruOHQA with the SberQuAD dataset in order to clearly demonstrate our contributions and the potential for further research in this area. Particular attention was paid to the potential of the corpus for training neural network models. Hence, we present the annotated task-oriented corpora of Holocaust testimonies in Russian.

KW - Corpora

KW - Question answering

KW - Visual history archives

UR - https://www.mendeley.com/catalogue/da248539-0d9a-3218-9c51-6846943fae83/

U2 - 10.1007/978-981-96-0990-1_16

DO - 10.1007/978-981-96-0990-1_16

M3 - Chapter

SN - 978-981-96-0989-5

SP - 183

EP - 191

BT - Literature, Language and Computing

A2 - Eismont, Polina

A2 - Khokhlova, Maria

A2 - Koryshev, Mikhail

PB - Springer Nature

ER -

ID: 133544088