Frequency word lists and their variability (the case of Russian fiction in 1900-1930)

Переведенное название: Частотные списки слов и их вариативность (на примере русской прозы 1900-1930 гг.)

T. Sherstinova , A. Grebennikov, T. Skrebtsova, A. Guseva, M. Gukasian, I. Egoshina, M. Turygina

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциирецензирование


Lexical system is an essential component of any natural language. Frequency word lists are a convenient representation of words functional activity in language as a whole or in some particular text. The parameters and properties of frequency word lists are in the center of attention of NLP experts, since they are used in numerous practical applications related to attribution of authorship, text automatic clustering and classification. The article explores frequency word lists of Russian fiction in the period of 1900-1930, which was marked by a series of dramatic historical events and presents unique statistical data on the most frequent words, parts of speech and keywords, and their dynamics. Special attention is paid to the issues of statistical consistency of frequency word list parameters, which becomes especially relevant when studying big text data. The study was carried out on the basis of fiction texts, which by the variety of topics, lexical and stylistic diversity reflects the variability of linguistic forms better than the other written text genres. In terms of the text corpus size and character, the research of this kind is being carried out for the first time.
Переведенное названиеЧастотные списки слов и их вариативность (на примере русской прозы 1900-1930 гг.)
Язык оригиналаанглийский
Название основной публикацииProceedings of 27th Conference of FRUCT Association
Место публикацииHelsinki
ИздательFRUCT Oy
ISBN (электронное издание)978-952-69244-3-4
ISBN (печатное издание)978-1-7281-6247-8
СостояниеОпубликовано - 2020
Событие27th Conference of Open Innovations Association (FRUCT) - Trento, Италия
Продолжительность: 7 сен 20209 сен 2020


конференция27th Conference of Open Innovations Association (FRUCT)


Подробные сведения о темах исследования «Частотные списки слов и их вариативность (на примере русской прозы 1900-1930 гг.)». Вместе они формируют уникальный семантический отпечаток (fingerprint).