Oral text is certainly discrete. It is built of “small bricks”, units of not only lexical but also the higher syntactical level. Common syntagmatic pauses, hesitative pauses such as physical (unfilled ones including breaks of clauses), sound pauses (e-e, m-m), and verbal (vot, kak eto, nu, znachit etc.) are markers of this discreetness. However, that reveals neither syntagma nor sentence as a unit to describe a syntactic structure of an oral text. Any type of pauses may occur in any place of an audio sequence. Thus, the search of sentences in spontaneous speech is quite complicated. In order to obtain such units a methodic of coercive punctuation that was used for marking the spontaneous monologues from the collection of oral texts named «Balanced Annotated Textotec» could be offered. The testee (philology experts) were asked to mark ends of the sentences by putting a period in the transcripts where neither pauses nor punctuation had been marked. The testee could only rely on the syntactic structure of the text and the connection between words and predicate centers. Involving more than twenty experts in an experiment provides more statistically accurate results. In this work we describe the results of our experiment and discuss further perspectives how those results can be used for automatic search of sentence boundaries in spontaneous speech.

Original languageEnglish
Title of host publicationSpeech and Computer - 19th International Conference, SPECOM 2017, Proceedings
EditorsAlexey Karpov, Iosif Mporas, Rodmonga Potapova
PublisherSpringer Nature
Pages456-463
Number of pages8
ISBN (Print)9783319664286
DOIs
StatePublished - 1 Jan 2017
Event19th International Conference on Speech and Computer - Hatfield, United Kingdom
Duration: 11 Sep 201715 Sep 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10458 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Speech and Computer
Abbreviated titleSPECOM 2017
Country/TerritoryUnited Kingdom
CityHatfield
Period11/09/1715/09/17

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

    Research areas

  • Discreetness of the oral text, Phrase boundary, Sentence, Speech corpus, Spontaneous monologue, Syntagma

ID: 50412351