Speech Acts Annotation of Everyday Conversations in the ORD corpus of Spoken Russian

Результат исследований: Научные публикации в периодических изданияхстатья

Выдержка

The paper describes annotation principles developed for tagging of speech acts in the “One Day of Speech” (ORD) corpus of Russian everyday speech, with special attention being paid to categories and subcategories of speech acts distinguished in the ORD. Annotation of speech acts is a part of pragmatic annotation of the corpus, which includes as well the tagging of macro- and microepisodes of verbal communication. Speech acts are annotated on four levels: 1) an orthographic transcript with information on syntagmatic and phrasal boundaries, 2) the speaker's code, 3) main category of a speech act, and 4) its subcategory. Practical approbation of the proposed annotation scheme has been made on the material of 6 macroepisodes of everyday communication, in which 2247 speech acts have been discerned. Representative speech acts turned out to be the most frequent type taking up to 39.36% of the analyzed material.
Язык оригиналаанглийский
Страницы (с-по)627–635
ЖурналLecture Notes in Computer Science
Том9811
DOI
СостояниеОпубликовано - 2016

Предметные области Scopus

  • Языки и лингвистика

Ключевые слова

  • corpus linguistics
  • speech corpus
  • pragmatics
  • spoken Russian
  • everyday dialogues

Цитировать

@article{9fdbb6fdf75a453480b591e88dfd5f98,
title = "Speech Acts Annotation of Everyday Conversations in the ORD corpus of Spoken Russian",
abstract = "The paper describes annotation principles developed for tagging of speech acts in the “One Day of Speech” (ORD) corpus of Russian everyday speech, with special attention being paid to categories and subcategories of speech acts distinguished in the ORD. Annotation of speech acts is a part of pragmatic annotation of the corpus, which includes as well the tagging of macro- and microepisodes of verbal communication. Speech acts are annotated on four levels: 1) an orthographic transcript with information on syntagmatic and phrasal boundaries, 2) the speaker's code, 3) main category of a speech act, and 4) its subcategory. Practical approbation of the proposed annotation scheme has been made on the material of 6 macroepisodes of everyday communication, in which 2247 speech acts have been discerned. Representative speech acts turned out to be the most frequent type taking up to 39.36{\%} of the analyzed material.",
keywords = "corpus linguistics, speech corpus, pragmatics, spoken Russian, everyday dialogues",
author = "T. Sherstinova",
year = "2016",
doi = "https://doi.org/10.1007/978-3-319-43958-7_76",
language = "English",
volume = "9811",
pages = "627–635",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer",

}

Speech Acts Annotation of Everyday Conversations in the ORD corpus of Spoken Russian. / Sherstinova, T.

В: Lecture Notes in Computer Science, Том 9811, 2016, стр. 627–635.

Результат исследований: Научные публикации в периодических изданияхстатья

TY - JOUR

T1 - Speech Acts Annotation of Everyday Conversations in the ORD corpus of Spoken Russian

AU - Sherstinova, T.

PY - 2016

Y1 - 2016

N2 - The paper describes annotation principles developed for tagging of speech acts in the “One Day of Speech” (ORD) corpus of Russian everyday speech, with special attention being paid to categories and subcategories of speech acts distinguished in the ORD. Annotation of speech acts is a part of pragmatic annotation of the corpus, which includes as well the tagging of macro- and microepisodes of verbal communication. Speech acts are annotated on four levels: 1) an orthographic transcript with information on syntagmatic and phrasal boundaries, 2) the speaker's code, 3) main category of a speech act, and 4) its subcategory. Practical approbation of the proposed annotation scheme has been made on the material of 6 macroepisodes of everyday communication, in which 2247 speech acts have been discerned. Representative speech acts turned out to be the most frequent type taking up to 39.36% of the analyzed material.

AB - The paper describes annotation principles developed for tagging of speech acts in the “One Day of Speech” (ORD) corpus of Russian everyday speech, with special attention being paid to categories and subcategories of speech acts distinguished in the ORD. Annotation of speech acts is a part of pragmatic annotation of the corpus, which includes as well the tagging of macro- and microepisodes of verbal communication. Speech acts are annotated on four levels: 1) an orthographic transcript with information on syntagmatic and phrasal boundaries, 2) the speaker's code, 3) main category of a speech act, and 4) its subcategory. Practical approbation of the proposed annotation scheme has been made on the material of 6 macroepisodes of everyday communication, in which 2247 speech acts have been discerned. Representative speech acts turned out to be the most frequent type taking up to 39.36% of the analyzed material.

KW - corpus linguistics

KW - speech corpus

KW - pragmatics

KW - spoken Russian

KW - everyday dialogues

U2 - https://doi.org/10.1007/978-3-319-43958-7_76

DO - https://doi.org/10.1007/978-3-319-43958-7_76

M3 - Article

VL - 9811

SP - 627

EP - 635

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -