This study was made on the base of the ORD corpus of everyday spoken Russian, containing the rich collection of audio recordings made in real-life settings. Speech transcripts of the ORD corpus imply mandatory indication of word and phrase breaks, self-correction, hesitations, fillers and other irregularities of spoken discourse. The paper deals with speech breaks in oral discourse (word breaks, phrase breaks, intraphrasal pauses, etc.). Quantitative analysis performed on the subcorpus of 187 600 tokens has shown that 7,56% of all phrases in everyday communication are not finished. If word breaks can be referred to word search/choice or self-correction, phrase breaks affect the text level and result in ragged, rough, and poorly structured syntactic sequence. Sociolinguistic analysis has revealed that phrase breaks are more frequent in men’s speech than in the women’s (8.16 vs. 7,12%). Seniors have significantly more speech breaks (10,76%) than children (6,78%), youth (6,08%) and middle-aged people (7,37%). As for status groups of speakers, the highest share of breaks is found in speech of unemployed and retired people (10,75%), whereas the lowest percentage of breaks is observed in speech of managers (4,50%) who care, apparently, more about their speech quality than others.

Original languageEnglish
Title of host publicationLanguage, Music and Computing - Second International Workshop, LMAC 2017, Revised Selected Papers
EditorsOlga Mitrenina, Asya Pereltsvaig, Polina Eismont
PublisherSpringer Nature
Pages122-130
Number of pages9
ISBN (Print)9783030055936
DOIs
StatePublished - 2019
Event2nd International Workshop on Language, Music and Computing, LMAC 2017 - St. Petersburg, Russian Federation
Duration: 17 Apr 201719 Apr 2017

Publication series

NameCommunications in Computer and Information Science
Volume943
ISSN (Print)1865-0929

Conference

Conference2nd International Workshop on Language, Music and Computing, LMAC 2017
Country/TerritoryRussian Federation
CitySt. Petersburg
Period17/04/1719/04/17

    Research areas

  • Phrase breaks, Sociophonetics, Speech disfluencies, Spoken Russian

    Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

ID: 99403280