Standard

High-Frequency Multiword Units and the Typological Distribution of Multiword Units in Spoken Russian. / Богданова-Бегларян, Наталья Викторовна; Шерстинова, Татьяна Юрьевна; Блинова, Ольга Владимировна; Хохлова, Мария Владимировна; Попова, Татьяна Ивановна.

Speech and Computer. SPECOM 2025. Szeged, Hungary : Springer Nature, 2025. p. 257-270 (Lecture Notes in Computer Science; Vol. 16188).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Богданова-Бегларян, НВ, Шерстинова, ТЮ, Блинова, ОВ, Хохлова, МВ & Попова, ТИ 2025, High-Frequency Multiword Units and the Typological Distribution of Multiword Units in Spoken Russian. in Speech and Computer. SPECOM 2025. Lecture Notes in Computer Science, vol. 16188, Springer Nature, Szeged, Hungary, pp. 257-270, 27th International Conference on Speech and Computer , Szeged, Hungary, 13/10/25. <https://link.springer.com/chapter/10.1007/978-3-032-07959-6_19>

APA

Vancouver

Author

BibTeX

@inproceedings{184c77fa6c3e4fca942ec80b43367bde,
title = "High-Frequency Multiword Units and the Typological Distribution of Multiword Units in Spoken Russian",
abstract = "Multiword units (MWUs) constitute a distinct class of linguistic phenomena located at the crossroads of lexis and syntax. Empirical data on their typology and frequency are essential for solving a wide range of applied problems in natural language processing. This paper presents a corpus-based study of MWUs in Russian everyday speech. Drawing on data from the ORD corpus comprising one million words of transcribed spontaneous discourse, over 8,000 MWU instances were identified and annotated. These MWUs are classified into eight main classes: non-phraseologized collocations, phraseologized collocations, occasional collocations, idiom forms, constructions, precedent texts and their elements, multiword pragmatic markers, and speech formulas. The paper presents a ranked list of the 50 most frequent MWUs in spoken Russian, along with the overall distribution of MWU types. The results indicate that pragmatic markers are the most dominant category (comprising over 30% of all MWUs), followed by non-phraseologized collocations (26%) and speech formulas (21%). The article also discusses the functional combinations of MWUs in spoken interaction and highlights precedent texts as one of the productive sources for MWU formation. The quantitative data obtained in this study contribute to both theoretical models of lexical and grammatical description of Russian everyday speech and practical tasks related to processing and generating spontaneous spoken language.",
keywords = "modern Russian, everyday speech, oral discourse, multiword units, collocations, pragmatic markers, precedent texts, statistical analysis, speech corpus, corpus linguistics, speech technologies",
author = "Богданова-Бегларян, {Наталья Викторовна} and Шерстинова, {Татьяна Юрьевна} and Блинова, {Ольга Владимировна} and Хохлова, {Мария Владимировна} and Попова, {Татьяна Ивановна}",
note = "Bogdanova-Beglarian, N.V., Blinova, O.V., Khokhlova, M.V., Sherstinova, T.Yu., Popova, T.I. High-Frequency Multiword Units and the Typological Distribution of Multiword Units in Spoken Russian // 27th International Conference “Speech and Computer”, SPECOM-2025. Szeged, Hungary. October 13-15, 2025. Рroceedings. Part II / A. Karpov, G. Gosztolya (eds.). LNAI, vol. 16188. – Springer, Cham. – Pp. 257-270.; 27th International Conference on Speech and Computer , SPECOM 2025 ; Conference date: 13-10-2025 Through 14-10-2025",
year = "2025",
month = nov,
day = "15",
language = "English",
series = "Lecture Notes in Computer Science",
publisher = "Springer Nature",
pages = "257--270",
booktitle = "Speech and Computer. SPECOM 2025",
address = "Germany",
url = "https://specom.inf.u-szeged.hu/",

}

RIS

TY - GEN

T1 - High-Frequency Multiword Units and the Typological Distribution of Multiword Units in Spoken Russian

AU - Богданова-Бегларян, Наталья Викторовна

AU - Шерстинова, Татьяна Юрьевна

AU - Блинова, Ольга Владимировна

AU - Хохлова, Мария Владимировна

AU - Попова, Татьяна Ивановна

N1 - Conference code: 27

PY - 2025/11/15

Y1 - 2025/11/15

N2 - Multiword units (MWUs) constitute a distinct class of linguistic phenomena located at the crossroads of lexis and syntax. Empirical data on their typology and frequency are essential for solving a wide range of applied problems in natural language processing. This paper presents a corpus-based study of MWUs in Russian everyday speech. Drawing on data from the ORD corpus comprising one million words of transcribed spontaneous discourse, over 8,000 MWU instances were identified and annotated. These MWUs are classified into eight main classes: non-phraseologized collocations, phraseologized collocations, occasional collocations, idiom forms, constructions, precedent texts and their elements, multiword pragmatic markers, and speech formulas. The paper presents a ranked list of the 50 most frequent MWUs in spoken Russian, along with the overall distribution of MWU types. The results indicate that pragmatic markers are the most dominant category (comprising over 30% of all MWUs), followed by non-phraseologized collocations (26%) and speech formulas (21%). The article also discusses the functional combinations of MWUs in spoken interaction and highlights precedent texts as one of the productive sources for MWU formation. The quantitative data obtained in this study contribute to both theoretical models of lexical and grammatical description of Russian everyday speech and practical tasks related to processing and generating spontaneous spoken language.

AB - Multiword units (MWUs) constitute a distinct class of linguistic phenomena located at the crossroads of lexis and syntax. Empirical data on their typology and frequency are essential for solving a wide range of applied problems in natural language processing. This paper presents a corpus-based study of MWUs in Russian everyday speech. Drawing on data from the ORD corpus comprising one million words of transcribed spontaneous discourse, over 8,000 MWU instances were identified and annotated. These MWUs are classified into eight main classes: non-phraseologized collocations, phraseologized collocations, occasional collocations, idiom forms, constructions, precedent texts and their elements, multiword pragmatic markers, and speech formulas. The paper presents a ranked list of the 50 most frequent MWUs in spoken Russian, along with the overall distribution of MWU types. The results indicate that pragmatic markers are the most dominant category (comprising over 30% of all MWUs), followed by non-phraseologized collocations (26%) and speech formulas (21%). The article also discusses the functional combinations of MWUs in spoken interaction and highlights precedent texts as one of the productive sources for MWU formation. The quantitative data obtained in this study contribute to both theoretical models of lexical and grammatical description of Russian everyday speech and practical tasks related to processing and generating spontaneous spoken language.

KW - modern Russian, everyday speech, oral discourse, multiword units, collocations, pragmatic markers, precedent texts, statistical analysis, speech corpus, corpus linguistics, speech technologies

UR - http://www.scopus.com/record/display.url?eid=2-s2.0-105020258744

M3 - Conference contribution

T3 - Lecture Notes in Computer Science

SP - 257

EP - 270

BT - Speech and Computer. SPECOM 2025

PB - Springer Nature

CY - Szeged, Hungary

T2 - 27th International Conference on Speech and Computer

Y2 - 13 October 2025 through 14 October 2025

ER -

ID: 144231378