The article describes the experience of pragmatic markers (PM) annotation in two Russian speech corpora: “One Speaker’s Day” (ORD; dialogues) and “Balanced Annotated Textotec” (SAT; monologues). To prepare an optimal PM annotation scheme, 4 pilot annotations were conducted on samples from ORD and SAT. It made it possible to form the final list of PM: 450 units, representing variants of 53 basic structural types. Processing the results of the pilot annotation allowed to obtain preliminary data on frequency of individual pragmatic markers and their types, as well as on the dependence of PM usage on sex and the level of speech competence of the speaker. As a result of statistical data processing, frequency lists of both PMs and their functions were obtained. The most commonly used in the dialogue are the PM вот, which is usually used as a «boundary marker» (G), and the PM там, which is usually used as a hesitative and/or rhythm-forming marker. In the monologue, the upper zone of the frequency list of the PMs is also full of boundary markers (G), marking the beginning/end of the monologue or serving as navigators in the text (вот/ну вот, значит, так). The most frequent types of PMs in dialogue are: X (hesitative markers), M (meta-communicative marker), GХ (boundary/hesitative marker), K (xeno-indicator marker that introduces someone’s speech), RX (rhythm-forming/hesitative marker). In the list of the most frequent types of PMs in monologue speech, the markers of the type GX (boundary/hesitative marker) and X (hesitative marker) are in the lead. The analysis of the frequency lists of PMs showed that we can talk about statistically significant differences in the use of PMs in dialogue and monologue.

Translated title of the contributionPragmatic markers annotation in Russian speech corpus: Research problem, approaches and results
Original languageRussian
Pages (from-to)72-85
Number of pages14
JournalKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Volume2019-May
Issue number18
StatePublished - 1 Jan 2019
Event2019 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2019 - Moscow, Russian Federation
Duration: 29 May 20191 Jun 2019

    Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

ID: 61379861