Pragmatic markers distribution in russian everyday speech: Frequency lists and other statistics for discourse modeling

Research output

4 Downloads (Pure)

Abstract

Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6% of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i.e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which are the variants of 50 main structural types. Besides, we consider the most frequent functions of PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.

Original languageEnglish
Title of host publicationSpeech and Computer - 21st International Conference, SPECOM 2019, Proceedings
EditorsAlbert Ali Salah, Alexey Karpov, Rodmonga Potapova
PublisherSpringer
Pages433-443
Number of pages11
Volume11658
ISBN (Electronic)9783030260613
ISBN (Print)9783030260606
DOIs
Publication statusPublished - 15 Aug 2019
Event21st International Conference on Speech and Computer, SPECOM 2019 - Istanbul
Duration: 20 Aug 201925 Aug 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11658 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Speech and Computer, SPECOM 2019
CountryTurkey
CityIstanbul
Period20/08/1925/08/19

Fingerprint

Statistics
Modeling
System Modeling
Unit
Information systems
Exceed
Fragment
Discourse
Speech

Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Bogdanova-Beglarian, N., Sherstinova, T., Blinova, O., & Martynenko, G. (2019). Pragmatic markers distribution in russian everyday speech: Frequency lists and other statistics for discourse modeling. In A. A. Salah, A. Karpov, & R. Potapova (Eds.), Speech and Computer - 21st International Conference, SPECOM 2019, Proceedings (Vol. 11658, pp. 433-443). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11658 LNAI). Springer. https://doi.org/10.1007/978-3-030-26061-3_44
Bogdanova-Beglarian, Natalia ; Sherstinova, Tatiana ; Blinova, Olga ; Martynenko, Gregory. / Pragmatic markers distribution in russian everyday speech : Frequency lists and other statistics for discourse modeling. Speech and Computer - 21st International Conference, SPECOM 2019, Proceedings. editor / Albert Ali Salah ; Alexey Karpov ; Rodmonga Potapova. Vol. 11658 Springer, 2019. pp. 433-443 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c6f2e64539234e12abec84e27eda55a3,
title = "Pragmatic markers distribution in russian everyday speech: Frequency lists and other statistics for discourse modeling",
abstract = "Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6{\%} of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i.e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which are the variants of 50 main structural types. Besides, we consider the most frequent functions of PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.",
keywords = "Everyday discourse, Frequency lists, NLP, Pragmatic markers, Pragmatics, Sociolinguistics, Speech corpus, Spoken dialogue, Spoken monologue, Spoken Russian, Statistics",
author = "Natalia Bogdanova-Beglarian and Tatiana Sherstinova and Olga Blinova and Gregory Martynenko",
note = "Bogdanova-Beglarian, N., Sherstinova. T., Blinova, O., Martynenko, G. Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling // Speech and Computer. 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, Proceedings / Ed. by A. Ali Salah, A. Karpov, R. Potapova. Lecture Notes in Computer Science book series (LNCS, vol. 11658). – Pp. 433-443.",
year = "2019",
month = "8",
day = "15",
doi = "10.1007/978-3-030-26061-3_44",
language = "English",
isbn = "9783030260606",
volume = "11658",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "433--443",
editor = "Salah, {Albert Ali} and Alexey Karpov and Rodmonga Potapova",
booktitle = "Speech and Computer - 21st International Conference, SPECOM 2019, Proceedings",
address = "Germany",

}

Bogdanova-Beglarian, N, Sherstinova, T, Blinova, O & Martynenko, G 2019, Pragmatic markers distribution in russian everyday speech: Frequency lists and other statistics for discourse modeling. in AA Salah, A Karpov & R Potapova (eds), Speech and Computer - 21st International Conference, SPECOM 2019, Proceedings. vol. 11658, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11658 LNAI, Springer, pp. 433-443, Istanbul, 20/08/19. https://doi.org/10.1007/978-3-030-26061-3_44

Pragmatic markers distribution in russian everyday speech : Frequency lists and other statistics for discourse modeling. / Bogdanova-Beglarian, Natalia; Sherstinova, Tatiana; Blinova, Olga; Martynenko, Gregory.

Speech and Computer - 21st International Conference, SPECOM 2019, Proceedings. ed. / Albert Ali Salah; Alexey Karpov; Rodmonga Potapova. Vol. 11658 Springer, 2019. p. 433-443 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11658 LNAI).

Research output

TY - GEN

T1 - Pragmatic markers distribution in russian everyday speech

T2 - Frequency lists and other statistics for discourse modeling

AU - Bogdanova-Beglarian, Natalia

AU - Sherstinova, Tatiana

AU - Blinova, Olga

AU - Martynenko, Gregory

N1 - Bogdanova-Beglarian, N., Sherstinova. T., Blinova, O., Martynenko, G. Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling // Speech and Computer. 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, Proceedings / Ed. by A. Ali Salah, A. Karpov, R. Potapova. Lecture Notes in Computer Science book series (LNCS, vol. 11658). – Pp. 433-443.

PY - 2019/8/15

Y1 - 2019/8/15

N2 - Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6% of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i.e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which are the variants of 50 main structural types. Besides, we consider the most frequent functions of PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.

AB - Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6% of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i.e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which are the variants of 50 main structural types. Besides, we consider the most frequent functions of PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.

KW - Everyday discourse

KW - Frequency lists

KW - NLP

KW - Pragmatic markers

KW - Pragmatics

KW - Sociolinguistics

KW - Speech corpus

KW - Spoken dialogue

KW - Spoken monologue

KW - Spoken Russian

KW - Statistics

UR - http://www.scopus.com/inward/record.url?scp=85071481696&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-26061-3_44

DO - 10.1007/978-3-030-26061-3_44

M3 - Conference contribution

AN - SCOPUS:85071481696

SN - 9783030260606

VL - 11658

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 433

EP - 443

BT - Speech and Computer - 21st International Conference, SPECOM 2019, Proceedings

A2 - Salah, Albert Ali

A2 - Karpov, Alexey

A2 - Potapova, Rodmonga

PB - Springer

ER -

Bogdanova-Beglarian N, Sherstinova T, Blinova O, Martynenko G. Pragmatic markers distribution in russian everyday speech: Frequency lists and other statistics for discourse modeling. In Salah AA, Karpov A, Potapova R, editors, Speech and Computer - 21st International Conference, SPECOM 2019, Proceedings. Vol. 11658. Springer. 2019. p. 433-443. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-26061-3_44