Pragmatic markers distribution in russian everyday speech: Frequency lists and other statistics for discourse modeling

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаярецензирование

2 Цитирования (Scopus)
46 Загрузки (Pure)

Аннотация

Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6% of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i.e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which are the variants of 50 main structural types. Besides, we consider the most frequent functions of PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.

Язык оригиналаанглийский
Название основной публикацииSpeech and Computer - 21st International Conference, SPECOM 2019, Proceedings
РедакторыAlbert Ali Salah, Alexey Karpov, Rodmonga Potapova
ИздательSpringer Nature
Страницы433-443
Число страниц11
Том11658
ISBN (электронное издание)9783030260613
ISBN (печатное издание)9783030260606
DOI
СостояниеОпубликовано - 15 авг 2019
Событие21ST INTERNATIONAL CONFERENCE ON SPEECH AND COMPUTER - ISTANBUL, TURKEY, ISTANBUL, Турция
Продолжительность: 20 авг 201925 авг 2019
Номер конференции: 21
http://specom.nw.ru/

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том11658 LNAI
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

конференция

конференция21ST INTERNATIONAL CONFERENCE ON SPEECH AND COMPUTER
Сокращенный заголовокSpecom 2019
СтранаТурция
ГородISTANBUL
Период20/08/1925/08/19
Адрес в сети Интернет

Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

Fingerprint Подробные сведения о темах исследования «Pragmatic markers distribution in russian everyday speech: Frequency lists and other statistics for discourse modeling». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать