The article describes the experience of pragmatic markers (PM) annotation in two Russian speech corpora: “One Speaker’s Day” (ORD; dialogues) and “Balanced Annotated Textotec” (SAT; monologues). To prepare an optimal PM annotation scheme, 4 pilot annotations were conducted on samples from ORD and SAT. It made it possible to form the final list of PM: 450 units, representing variants of 53 basic structural types. Processing the results of the pilot annotation allowed to obtain preliminary data on frequency of individual pragmatic markers and their types, as well as on the dependence of PM usage on sex and the level of speech competence of the speaker. As a result of statistical data processing, frequency lists of both PMs and their functions were obtained. The most commonly used in the dialogue are the PM вот, which is usually used as a «boundary marker» (G), and the PM там, which is usually used as a hesitative and/or rhythm-forming marker. In the monologue, the upper zone of the frequency list of the PMs is also full of boundary markers (G), marking the beginning/end of the monologue or serving as navigators in the text (вот/ну вот, значит, так). The most frequent types of PMs in dialogue are: X (hesitative markers), M (meta-communicative marker), GХ (boundary/hesitative marker), K (xeno-indicator marker that introduces someone’s speech), RX (rhythm-forming/hesitative marker). In the list of the most frequent types of PMs in monologue speech, the markers of the type GX (boundary/hesitative marker) and X (hesitative marker) are in the lead. The analysis of the frequency lists of PMs showed that we can talk about statistically significant differences in the use of PMs in dialogue and monologue.

Переведенное названиеPragmatic markers annotation in Russian speech corpus: Research problem, approaches and results
Язык оригиналарусский
Страницы (с-по)72-85
Число страниц14
ЖурналKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Том2019-May
Номер выпуска18
СостояниеОпубликовано - 1 янв 2019
Событие2019 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2019 - Moscow, Российская Федерация
Продолжительность: 29 мая 20191 июн 2019

    Области исследований

  • Corpus annotation, Dialogue, Monologue, Pragmatic marker, Russian everyday speech, Speech corpus

    Предметные области Scopus

  • Языки и лингвистика
  • Языки и лингвистика
  • Прикладные компьютерные науки

ID: 61379861