Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › Рецензирование
Automatic Annotation of Discourse and Speech Formulas in Internet Communication: A Telegram Comment Corpus. / Попова, Татьяна Ивановна; Масленикова, Александра.
Speech and Computer. SPECOM 2025. Szeged, Hungary : Springer Nature, 2026. стр. 278-292 (Lecture Notes in Computer Science; № 16187).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › Рецензирование
}
TY - GEN
T1 - Automatic Annotation of Discourse and Speech Formulas in Internet Communication: A Telegram Comment Corpus
AU - Попова, Татьяна Ивановна
AU - Масленикова, Александра
N1 - Conference code: 27
PY - 2026
Y1 - 2026
N2 - This article presents a system for the automatic processing of user comments aimed at annotating speech and discourse formulas that actively function in everyday interaction, including digital communication. A Python-based program using the Telegram API was developed to automate the collection, filtering, and annotation of empirical data. In addition to building a user corpus, the study also included the evaluation of automatic processing results. The source material was drawn from the Telegram news channel Fontanka SPB Online. As a result of automatic processing, 70 speech and discourse formulas were extracted and grouped based on their source lexicons. The classification of the examined multiword units was grounded in the findings of two research projects: the construction of the Pragmaticon in Moscow and the annotation of stable multiword units in Saint Petersburg. The implementation of automatic annotation enabled the identification of formulas with a high pragmatic load and captured their specific functions in internet communication. For example, semantic irony was observed in the use of formulas such as ‘khorosho’ (‘fine’) and ‘bez problem’ (‘no problem’), which traditionally indicate agreement. The study identified the most frequent types of user responses reflected by the formulas: affirmation and negation. The results demonstrate the potential of the automatic approach for describing speech and discourse formulas in digital discourse and highlight the need to refine existing classifications of speech act.
AB - This article presents a system for the automatic processing of user comments aimed at annotating speech and discourse formulas that actively function in everyday interaction, including digital communication. A Python-based program using the Telegram API was developed to automate the collection, filtering, and annotation of empirical data. In addition to building a user corpus, the study also included the evaluation of automatic processing results. The source material was drawn from the Telegram news channel Fontanka SPB Online. As a result of automatic processing, 70 speech and discourse formulas were extracted and grouped based on their source lexicons. The classification of the examined multiword units was grounded in the findings of two research projects: the construction of the Pragmaticon in Moscow and the annotation of stable multiword units in Saint Petersburg. The implementation of automatic annotation enabled the identification of formulas with a high pragmatic load and captured their specific functions in internet communication. For example, semantic irony was observed in the use of formulas such as ‘khorosho’ (‘fine’) and ‘bez problem’ (‘no problem’), which traditionally indicate agreement. The study identified the most frequent types of user responses reflected by the formulas: affirmation and negation. The results demonstrate the potential of the automatic approach for describing speech and discourse formulas in digital discourse and highlight the need to refine existing classifications of speech act.
KW - Automatic Annotation
KW - Statistical Analysis
KW - Modern Russian
KW - Corpus Linguistics
KW - Discourse Formulas
KW - Internet Discourse
KW - Internet Comment
KW - Speech Formulas
KW - Automatic Annotation
KW - Corpus Linguistics
KW - Discourse Formulas
KW - Internet Comment
KW - Internet Discourse
KW - Modern Russian
KW - Speech Formulas
KW - Statistical Analysis
UR - https://link.springer.com/chapter/10.1007/978-3-032-07956-5_20
UR - https://www.mendeley.com/catalogue/da481d95-1e08-3370-9a79-a221343755c2/
U2 - 10.1007/978-3-032-07956-5_20
DO - 10.1007/978-3-032-07956-5_20
M3 - статья в сборнике материалов конференции
SN - 9783032079558
T3 - Lecture Notes in Computer Science
SP - 278
EP - 292
BT - Speech and Computer. SPECOM 2025
PB - Springer Nature
CY - Szeged, Hungary
T2 - 27th International Conference on Speech and Computer
Y2 - 13 October 2025 through 14 October 2025
ER -
ID: 144722668