The article represents one of the Russian speech corpora: a collection of monologic texts, known as the “Balanced Annotated Text Collection (Textotec)” (SAT). This corpus was being assembled in St. Petersburg State University for more than 20 years, using the author’s (N.V. Bogdanova-Beglarian’s) methodology of data collection, which involves a fairly strict set of experimental procedures. SAT is designed to study various types of spontaneous monologues(reading, retelling, image description, story on the topic) and it contains texts recorded from five professionally-oriented groups of native speakers (medical doctors, lawyers, computer specialists, philologists, teachers of Russian as a foreign language, and teachers-philosophers), several blocks of students speech (philologists and non-philologists), as well as four blocks of the interfered Russian speech of native speakers of other languages: Americans, Chinese, Francophone and Dutch. In total, there are about 700 texts in the SAT and about 50 hours of sound recording. In the article, against the background of other Russian-speaking and foreign speaking corpora, a description of this linguistic resource is given, the main topics developed on its material are marked, and prospects for continuing work are outlined.
Translated title of the contributionCORPUS “BALANCED ANNOTATED TEXT COLLECTION (TEXTOTEC)” (SAT): STUDYING THE SPECIFICITY OF RUSSIAN MONOLOGICAL SPEECH
Original languageRussian
Pages (from-to)110-125
JournalТруды Института русского языка им. В. В. Виноградова
Issue number21
StatePublished - 2019

    Research areas

  • modern Russian language, oral monologic speech, speech corpus, Natural Language Processing, database, linguistic experiment, Reading, description of the image, retelling of the text (reproductive), spontaneous monologue, sociolinguistics, psycholinguistics

    Scopus subject areas

  • Arts and Humanities(all)

ID: 51131801