This article presents a system for the automatic processing of user comments aimed at annotating speech and discourse formulas that actively function in everyday interaction, including digital communication. A Python-based program using the Telegram API was developed to automate the collection, filtering, and annotation of empirical data. In addition to building a user corpus, the study also included the evaluation of automatic processing results. The source material was drawn from the Telegram news channel Fontanka SPB Online. As a result of automatic processing, 70 speech and discourse formulas were extracted and grouped based on their source lexicons. The classification of the examined multiword units was grounded in the findings of two research projects: the construction of the Pragmaticon in Moscow and the annotation of stable multiword units in Saint Petersburg. The implementation of automatic annotation enabled the identification of formulas with a high pragmatic load and captured their specific functions in internet communication. For example, semantic irony was observed in the use of formulas such as ‘khorosho’ (‘fine’) and ‘bez problem’ (‘no problem’), which traditionally indicate agreement. The study identified the most frequent types of user responses reflected by the formulas: affirmation and negation. The results demonstrate the potential of the automatic approach for describing speech and discourse formulas in digital discourse and highlight the need to refine existing classifications of speech act.
Original languageRussian
Title of host publication Speech and Computer. SPECOM 2025
Place of PublicationSzeged, Hungary
PublisherSpringer Nature
Pages278-292
Number of pages15
ISBN (Print)9783032079558
DOIs
StatePublished - 2026
Event27th International Conference on Speech and Computer - Szeged, Hungary, Szeged, Hungary
Duration: 13 Oct 202514 Oct 2025
Conference number: 27
https://specom.inf.u-szeged.hu/

Publication series

NameLecture Notes in Computer Science
Number16187

Conference

Conference27th International Conference on Speech and Computer
Abbreviated titleSPECOM 2025
Country/TerritoryHungary
City Szeged
Period13/10/2514/10/25
Internet address

    Research areas

  • Automatic Annotation, Corpus Linguistics, Discourse Formulas, Internet Comment, Internet Discourse, Modern Russian, Speech Formulas, Statistical Analysis

ID: 144722668