In search for Russian low-frequency words › Научные исследования в СПбГУ

Ольга Владимировна Блинова - Основной докладчик

One of the parameters for text complexity assessing is words frequency. The presumption that the reader is having difficulty meeting low-frequency or unfamiliar words, is used in assessing the readability of texts. For example, the Dale-Chale readability formula takes into account the number of unfamiliar words. There are relatively simple solutions to the problem of determining which words are to be considered unfamiliar to the reader. For example, if we are talking about an educational text for second-language learners, words that are not included in the lexical minimum, may be considered unfamiliar. Our current study is related to complexity assessment for the texts of Russian official documents. We conduct a survey to determine the perceptual («subjective») complexity. In addition, we created the Corpus of Russian Internal Documents and Acts (CorRIDA) and use quantitative corpus techniques to describe the «objective» complexity. We face the task of determining the share of low-frequency words in the texts. It is unclear what units we can consider as words with a low general-language frequency. This paper is aimed at forming the list of low-frequency words by comparing frequency lists, obtained on the material of three Russian corpora.

9 сен 2021

Событие (конференция)

Заголовок	11th International Quantitative Linguistics Conference (QUALICO 2021, postponed Qualico 2020)
Сокр. Заголовок	QUALICO 2021
Период	9/09/21 → 11/09/21
Веб-адрес (URL-адрес)	https://www.qualico2020.org/index.html
Местоположение	National Institute for Japanese Language and Linguistics (NINJAL)
Город	Токио
Страна/Tерритория	Япония
Степень признания	международный уровень

Ссылки

Программа

ID: 86425551